Class WekaInstancesMerge

  • All Implemented Interfaces:
    AdditionalInformationHandler, ClassCrossReference, CleanUpHandler, CrossReference, Destroyable, GlobalInfoSupporter, LoggingLevelHandler, LoggingSupporter, OptionHandler, QuickInfoSupporter, ShallowCopySupporter<Actor>, SizeOfHandler, Stoppable, StoppableWithFeedback, VariablesInspectionHandler, VariableChangeListener, Actor, ErrorHandler, InputConsumer, OutputProducer, WekaMergeInstancesActor, Serializable, Comparable

    public class WekaInstancesMerge
    extends AbstractTransformer
    implements WekaMergeInstancesActor, ClassCrossReference
    Merges multiple datasets, either from file or using Instances/Instance objects.
    If no 'ID' attribute is named, then all datasets must contain the same number of rows.
    Attributes can be excluded from ending up in the final dataset via a regular expression. They can also be prefixed with name and/or index.

    Input/output:
    - accepts:
       java.lang.String[]
       java.io.File[]
       weka.core.Instance[]
       weka.core.Instances[]
    - generates:
       weka.core.Instances


    -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel)
        The logging level for outputting errors and debugging output.
        default: WARNING
     
    -name <java.lang.String> (property: name)
        The name of the actor.
        default: WekaInstancesMerge
     
    -annotation <adams.core.base.BaseAnnotation> (property: annotations)
        The annotations to attach to this actor.
        default: 
     
    -skip <boolean> (property: skip)
        If set to true, transformation is skipped and the input token is just forwarded 
        as it is.
        default: false
     
    -stop-flow-on-error <boolean> (property: stopFlowOnError)
        If set to true, the flow execution at this level gets stopped in case this 
        actor encounters an error; the error gets propagated; useful for critical 
        actors.
        default: false
     
    -silent <boolean> (property: silent)
        If enabled, then no errors are output in the console; Note: the enclosing 
        actor handler must have this enabled as well.
        default: false
     
    -use-prefix <boolean> (property: usePrefix)
        Whether to prefix the attribute names of each dataset with an index and 
        an optional string.
        default: false
     
    -add-index <boolean> (property: addIndex)
        Whether to add the index of the dataset to the prefix.
        default: false
     
    -remove <boolean> (property: remove)
        If true, only keep instances where data is available from each source.
        default: false
     
    -prefix <java.lang.String> (property: prefix)
        The optional prefix string to prefix the index number with (in case prefixes 
        are used); '@' is a placeholder for the relation name.
        default: dataset
     
    -prefix-separator <java.lang.String> (property: prefixSeparator)
        The separator string between the generated prefix and the original attribute 
        name.
        default: -
     
    -exclude-atts <java.lang.String> (property: excludedAttributes)
        The regular expression used on the attribute names, to determine whether 
        an attribute should be excluded or not (matching sense can be inverted); 
        leave empty to include all attributes.
        default: 
     
    -invert <boolean> (property: invertMatchingSense)
        Whether to invert the matching sense of excluding attributes, ie, the regular 
        expression is used for including attributes.
        default: false
     
    -unique-id <java.lang.String> (property: uniqueID)
        The name of the attribute (string/numeric) used for uniquely identifying 
        rows among the datasets.
        default: 
     
    -keep-only-single-unique-id <boolean> (property: keepOnlySingleUniqueID)
        If enabled, only a single instance of the unique ID attribute is kept.
        default: false
     
    -strict <boolean> (property: strict)
        If enabled, ensures that IDs in unique ID column are truly unique.
        default: false
     
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • m_UsePrefix

        protected boolean m_UsePrefix
        whether to prefix the attribute names of each dataset with an index.
      • m_AddIndex

        protected boolean m_AddIndex
        whether to add the index to the prefix.
      • m_Remove

        protected boolean m_Remove
        whether to remove when not all present.
      • m_Prefix

        protected String m_Prefix
        the additional prefix name to use, apart from the index.
      • m_PrefixSeparator

        protected String m_PrefixSeparator
        the separator between index and actual attribute name.
      • m_ExcludedAttributes

        protected String m_ExcludedAttributes
        regular expression for excluding attributes from the datasets.
      • m_InvertMatchingSense

        protected boolean m_InvertMatchingSense
        whether to invert the matching sense for excluding attributes.
      • m_UniqueID

        protected String m_UniqueID
        the string or numeric attribute to use as unique identifier for rows.
      • m_KeepOnlySingleUniqueID

        protected boolean m_KeepOnlySingleUniqueID
        whether to keep only a single instance of the unique ID attribute.
      • m_Strict

        protected boolean m_Strict
        whether to fail if IDs not unique.
      • m_AttType

        protected int m_AttType
        the attribute type of the ID attribute.
      • m_UniqueIDAtts

        protected List<String> m_UniqueIDAtts
        the unique ID attributes.
    • Constructor Detail

      • WekaInstancesMerge

        public WekaInstancesMerge()
    • Method Detail

      • setRemove

        public void setRemove​(boolean value)
        Sets whether to remove if not all present
        Parameters:
        value - if true then remove instance if not all there to merge
      • getRemove

        public boolean getRemove()
        Returns whether to remove if not all present
        Returns:
        if true then remove instance if not all there to merge
      • removeTipText

        public String removeTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setUsePrefix

        public void setUsePrefix​(boolean value)
        Sets whether to use prefixes.
        Parameters:
        value - if true then the attributes will get prefixed
      • getUsePrefix

        public boolean getUsePrefix()
        Returns whether to use prefixes.
        Returns:
        true if the attributes will get prefixed
      • usePrefixTipText

        public String usePrefixTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setAddIndex

        public void setAddIndex​(boolean value)
        Sets whether to add the dataset index number to the prefix.
        Parameters:
        value - if true then the index will be used in the prefix
      • getAddIndex

        public boolean getAddIndex()
        Returns whether to add the dataset index number to the prefix.
        Returns:
        true if the index will be used in the prefix
      • addIndexTipText

        public String addIndexTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setPrefix

        public void setPrefix​(String value)
        Sets the optional prefix string.
        Parameters:
        value - the optional prefix string
      • getPrefix

        public String getPrefix()
        Returns the optional prefix string.
        Returns:
        the optional prefix string
      • prefixTipText

        public String prefixTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setPrefixSeparator

        public void setPrefixSeparator​(String value)
        Sets the prefix separator string.
        Parameters:
        value - the prefix separator string
      • getPrefixSeparator

        public String getPrefixSeparator()
        Returns the prefix separator string.
        Returns:
        the prefix separator string
      • prefixSeparatorTipText

        public String prefixSeparatorTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setExcludedAttributes

        public void setExcludedAttributes​(String value)
        Sets the regular expression for excluding attributes.
        Parameters:
        value - the regular expression
      • getExcludedAttributes

        public String getExcludedAttributes()
        Returns the prefix separator string.
        Returns:
        the prefix separator string
      • excludedAttributesTipText

        public String excludedAttributesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setInvertMatchingSense

        public void setInvertMatchingSense​(boolean value)
        Sets whether to invert the matching sense.
        Parameters:
        value - if true then matching sense gets inverted
      • getInvertMatchingSense

        public boolean getInvertMatchingSense()
        Returns whether to invert the matching sense.
        Returns:
        true if the attributes will get prefixed
      • invertMatchingSenseTipText

        public String invertMatchingSenseTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setUniqueID

        public void setUniqueID​(String value)
        Sets the attribute (string/numeric) to use for uniquely identifying rows.
        Parameters:
        value - the attribute name
      • getUniqueID

        public String getUniqueID()
        Returns the attribute (string/numeric) to use for uniquely identifying rows.
        Returns:
        the attribute name
      • uniqueIDTipText

        public String uniqueIDTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setKeepOnlySingleUniqueID

        public void setKeepOnlySingleUniqueID​(boolean value)
        Sets whether to keep only a single instance of the unique ID attribute.
        Parameters:
        value - true if to keep only single instance
      • getKeepOnlySingleUniqueID

        public boolean getKeepOnlySingleUniqueID()
        Returns whether to keep only a single instance of the unique ID attribute.
        Returns:
        true if to keep only single instance
      • keepOnlySingleUniqueIDTipText

        public String keepOnlySingleUniqueIDTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setStrict

        public void setStrict​(boolean value)
        Sets whether to enforce uniqueness in IDs.
        Parameters:
        value - true if to enforce
      • getStrict

        public boolean getStrict()
        Returns whether to enforce uniqueness in IDs.
        Returns:
        true if to enforce
      • strictTipText

        public String strictTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • accepts

        public Class[] accepts()
        Returns the class that the consumer accepts.
        Specified by:
        accepts in interface InputConsumer
        Returns:
        java.lang.String[].class, java.io.File[].class, weka.core.Instance[].class, weka.core.Instances[].class
      • generates

        public Class[] generates()
        Returns the class of objects that it generates.
        Specified by:
        generates in interface OutputProducer
        Returns:
        weka.core.Instances.class
      • excludeAttributes

        protected weka.core.Instances excludeAttributes​(weka.core.Instances inst)
        Excludes attributes from the data.
        Parameters:
        inst - the data to process
        Returns:
        the processed data
      • createPrefix

        protected String createPrefix​(weka.core.Instances inst,
                                      int index)
        Generates the prefix for the dataset/index.
        Parameters:
        inst - the current dataset
        index - the index
        Returns:
        the prefix
      • prefixAttributes

        protected weka.core.Instances prefixAttributes​(weka.core.Instances inst,
                                                       int index)
        Prefixes the attributes.
        Parameters:
        index - the index of the dataset
        inst - the data to process
        Returns:
        the processed data
      • prepareData

        protected weka.core.Instances prepareData​(weka.core.Instances inst,
                                                  int index)
        Prepares the data, prefixing attributes, removing columns, etc, before merging it.
        Parameters:
        inst - the data to process
        index - the 0-based index of the dataset being processed
        Returns:
        the prepared data
      • updateIDs

        protected void updateIDs​(int instIndex,
                                 weka.core.Instances inst,
                                 HashSet ids)
        Updates the IDs in the hashset with the ones stored in the ID attribute of the provided dataset.
        Parameters:
        instIndex - the dataset index
        inst - the dataset to obtain the IDs from
        ids - the hashset to store the IDs in
      • merge

        protected weka.core.Instances merge​(weka.core.Instances[] orig,
                                            weka.core.Instances[] inst,
                                            HashSet ids)
        Merges the datasets based on the collected IDs.
        Parameters:
        orig - the original datasets
        inst - the processed datasets to merge into one
        ids - the IDs for identifying the rows
        Returns:
        the merged dataset
      • doExecute

        protected String doExecute()
        Executes the flow item.
        Specified by:
        doExecute in class AbstractActor
        Returns:
        null if everything is fine, otherwise error message