Class WekaInstancesMerge

  • All Implemented Interfaces:
    adams.core.AdditionalInformationHandler, adams.core.ClassCrossReference, adams.core.CleanUpHandler, adams.core.CrossReference, adams.core.Destroyable, adams.core.GlobalInfoSupporter, adams.core.logging.LoggingLevelHandler, adams.core.logging.LoggingSupporter, adams.core.option.OptionHandler, adams.core.QuickInfoSupporter, adams.core.ShallowCopySupporter<adams.flow.core.Actor>, adams.core.SizeOfHandler, adams.core.Stoppable, adams.core.StoppableWithFeedback, adams.core.VariablesInspectionHandler, adams.event.VariableChangeListener, adams.flow.core.Actor, adams.flow.core.ErrorHandler, adams.flow.core.InputConsumer, adams.flow.core.OutputProducer, WekaMergeInstancesActor, Serializable, Comparable

    public class WekaInstancesMerge
    extends adams.flow.transformer.AbstractTransformer
    implements WekaMergeInstancesActor, adams.core.ClassCrossReference
    Merges multiple datasets, either from file or using Instances/Instance objects.
    If no 'ID' attribute is named, then all datasets must contain the same number of rows.
    Attributes can be excluded from ending up in the final dataset via a regular expression. They can also be prefixed with name and/or index.

    Input/output:
    - accepts:
       java.lang.String[]
       java.io.File[]
       weka.core.Instance[]
       weka.core.Instances[]
    - generates:
       weka.core.Instances


    -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel)
        The logging level for outputting errors and debugging output.
        default: WARNING
     
    -name <java.lang.String> (property: name)
        The name of the actor.
        default: WekaInstancesMerge
     
    -annotation <adams.core.base.BaseAnnotation> (property: annotations)
        The annotations to attach to this actor.
        default: 
     
    -skip <boolean> (property: skip)
        If set to true, transformation is skipped and the input token is just forwarded 
        as it is.
        default: false
     
    -stop-flow-on-error <boolean> (property: stopFlowOnError)
        If set to true, the flow execution at this level gets stopped in case this 
        actor encounters an error; the error gets propagated; useful for critical 
        actors.
        default: false
     
    -silent <boolean> (property: silent)
        If enabled, then no errors are output in the console; Note: the enclosing 
        actor handler must have this enabled as well.
        default: false
     
    -use-prefix <boolean> (property: usePrefix)
        Whether to prefix the attribute names of each dataset with an index and 
        an optional string.
        default: false
     
    -add-index <boolean> (property: addIndex)
        Whether to add the index of the dataset to the prefix.
        default: false
     
    -remove <boolean> (property: remove)
        If true, only keep instances where data is available from each source.
        default: false
     
    -prefix <java.lang.String> (property: prefix)
        The optional prefix string to prefix the index number with (in case prefixes 
        are used); '@' is a placeholder for the relation name.
        default: dataset
     
    -prefix-separator <java.lang.String> (property: prefixSeparator)
        The separator string between the generated prefix and the original attribute 
        name.
        default: -
     
    -exclude-atts <java.lang.String> (property: excludedAttributes)
        The regular expression used on the attribute names, to determine whether 
        an attribute should be excluded or not (matching sense can be inverted); 
        leave empty to include all attributes.
        default: 
     
    -invert <boolean> (property: invertMatchingSense)
        Whether to invert the matching sense of excluding attributes, ie, the regular 
        expression is used for including attributes.
        default: false
     
    -unique-id <java.lang.String> (property: uniqueID)
        The name of the attribute (string/numeric) used for uniquely identifying 
        rows among the datasets.
        default: 
     
    -keep-only-single-unique-id <boolean> (property: keepOnlySingleUniqueID)
        If enabled, only a single instance of the unique ID attribute is kept.
        default: false
     
    -strict <boolean> (property: strict)
        If enabled, ensures that IDs in unique ID column are truly unique.
        default: false
     
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected boolean m_AddIndex
      whether to add the index to the prefix.
      protected int m_AttType
      the attribute type of the ID attribute.
      protected String m_ExcludedAttributes
      regular expression for excluding attributes from the datasets.
      protected boolean m_InvertMatchingSense
      whether to invert the matching sense for excluding attributes.
      protected boolean m_KeepOnlySingleUniqueID
      whether to keep only a single instance of the unique ID attribute.
      protected String m_Prefix
      the additional prefix name to use, apart from the index.
      protected String m_PrefixSeparator
      the separator between index and actual attribute name.
      protected boolean m_Remove
      whether to remove when not all present.
      protected boolean m_Strict
      whether to fail if IDs not unique.
      protected String m_UniqueID
      the string or numeric attribute to use as unique identifier for rows.
      protected List<String> m_UniqueIDAtts
      the unique ID attributes.
      protected boolean m_UsePrefix
      whether to prefix the attribute names of each dataset with an index.
      • Fields inherited from class adams.flow.transformer.AbstractTransformer

        BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
      • Fields inherited from class adams.flow.core.AbstractActor

        m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_Executing, m_ExecutionListeningSupporter, m_FullName, m_LoggingPrefix, m_Name, m_Parent, m_ScopeHandler, m_Self, m_Silent, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
      • Fields inherited from class adams.core.option.AbstractOptionHandler

        m_OptionManager
      • Fields inherited from class adams.core.logging.LoggingObject

        m_Logger, m_LoggingIsEnabled, m_LoggingLevel
      • Fields inherited from interface adams.flow.core.Actor

        FILE_EXTENSION, FILE_EXTENSION_GZ
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Class[] accepts()
      Returns the class that the consumer accepts.
      String addIndexTipText()
      Returns the tip text for this property.
      protected String createPrefix​(weka.core.Instances inst, int index)
      Generates the prefix for the dataset/index.
      void defineOptions()
      Adds options to the internal list of options.
      protected String doExecute()
      Executes the flow item.
      protected weka.core.Instances excludeAttributes​(weka.core.Instances inst)
      Excludes attributes from the data.
      String excludedAttributesTipText()
      Returns the tip text for this property.
      Class[] generates()
      Returns the class of objects that it generates.
      boolean getAddIndex()
      Returns whether to add the dataset index number to the prefix.
      Class[] getClassCrossReferences()
      Returns the cross-referenced classes.
      String getExcludedAttributes()
      Returns the prefix separator string.
      boolean getInvertMatchingSense()
      Returns whether to invert the matching sense.
      boolean getKeepOnlySingleUniqueID()
      Returns whether to keep only a single instance of the unique ID attribute.
      String getPrefix()
      Returns the optional prefix string.
      String getPrefixSeparator()
      Returns the prefix separator string.
      String getQuickInfo()
      Returns a quick info about the actor, which will be displayed in the GUI.
      boolean getRemove()
      Returns whether to remove if not all present
      boolean getStrict()
      Returns whether to enforce uniqueness in IDs.
      String getUniqueID()
      Returns the attribute (string/numeric) to use for uniquely identifying rows.
      boolean getUsePrefix()
      Returns whether to use prefixes.
      String globalInfo()
      Returns a string describing the object.
      String invertMatchingSenseTipText()
      Returns the tip text for this property.
      String keepOnlySingleUniqueIDTipText()
      Returns the tip text for this property.
      protected weka.core.Instances merge​(weka.core.Instances[] orig, weka.core.Instances[] inst, HashSet ids)
      Merges the datasets based on the collected IDs.
      protected weka.core.Instances prefixAttributes​(weka.core.Instances inst, int index)
      Prefixes the attributes.
      String prefixSeparatorTipText()
      Returns the tip text for this property.
      String prefixTipText()
      Returns the tip text for this property.
      protected weka.core.Instances prepareData​(weka.core.Instances inst, int index)
      Prepares the data, prefixing attributes, removing columns, etc, before merging it.
      String removeTipText()
      Returns the tip text for this property.
      void setAddIndex​(boolean value)
      Sets whether to add the dataset index number to the prefix.
      void setExcludedAttributes​(String value)
      Sets the regular expression for excluding attributes.
      void setInvertMatchingSense​(boolean value)
      Sets whether to invert the matching sense.
      void setKeepOnlySingleUniqueID​(boolean value)
      Sets whether to keep only a single instance of the unique ID attribute.
      void setPrefix​(String value)
      Sets the optional prefix string.
      void setPrefixSeparator​(String value)
      Sets the prefix separator string.
      void setRemove​(boolean value)
      Sets whether to remove if not all present
      void setStrict​(boolean value)
      Sets whether to enforce uniqueness in IDs.
      void setUniqueID​(String value)
      Sets the attribute (string/numeric) to use for uniquely identifying rows.
      void setUsePrefix​(boolean value)
      Sets whether to use prefixes.
      String strictTipText()
      Returns the tip text for this property.
      String uniqueIDTipText()
      Returns the tip text for this property.
      protected void updateIDs​(int instIndex, weka.core.Instances inst, HashSet ids)
      Updates the IDs in the hashset with the ones stored in the ID attribute of the provided dataset.
      String usePrefixTipText()
      Returns the tip text for this property.
      • Methods inherited from class adams.flow.transformer.AbstractTransformer

        backupState, currentInput, execute, hasInput, hasPendingOutput, input, output, postExecute, restoreState, wrapUp
      • Methods inherited from class adams.flow.core.AbstractActor

        annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, configureLogger, destroy, equals, finalUpdateVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, handleException, hasErrorHandler, hasStopMessage, index, initialize, isBackedUp, isExecuted, isExecuting, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, performVariableChecks, preExecute, pruneBackup, pruneBackup, reset, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, silentTipText, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, updateVariables, variableChanged
      • Methods inherited from class adams.core.option.AbstractOptionHandler

        cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
      • Methods inherited from class adams.core.logging.LoggingObject

        getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled
      • Methods inherited from interface adams.flow.core.Actor

        cleanUp, compareTo, destroy, equals, execute, findVariables, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isExecuted, isFinished, isHeadless, isStopped, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, sizeOf, stopExecution, stopExecution, toCommandLine, variableChanged, wrapUp
      • Methods inherited from interface adams.core.AdditionalInformationHandler

        getAdditionalInformation
      • Methods inherited from interface adams.flow.core.InputConsumer

        currentInput, hasInput, input
      • Methods inherited from interface adams.core.logging.LoggingLevelHandler

        getLoggingLevel, setLoggingLevel
      • Methods inherited from interface adams.core.logging.LoggingSupporter

        getLogger, isLoggingEnabled
      • Methods inherited from interface adams.core.option.OptionHandler

        cleanUpOptions, getOptionManager
      • Methods inherited from interface adams.flow.core.OutputProducer

        hasPendingOutput, output
      • Methods inherited from interface adams.core.VariablesInspectionHandler

        canInspectOptions
    • Field Detail

      • m_UsePrefix

        protected boolean m_UsePrefix
        whether to prefix the attribute names of each dataset with an index.
      • m_AddIndex

        protected boolean m_AddIndex
        whether to add the index to the prefix.
      • m_Remove

        protected boolean m_Remove
        whether to remove when not all present.
      • m_Prefix

        protected String m_Prefix
        the additional prefix name to use, apart from the index.
      • m_PrefixSeparator

        protected String m_PrefixSeparator
        the separator between index and actual attribute name.
      • m_ExcludedAttributes

        protected String m_ExcludedAttributes
        regular expression for excluding attributes from the datasets.
      • m_InvertMatchingSense

        protected boolean m_InvertMatchingSense
        whether to invert the matching sense for excluding attributes.
      • m_UniqueID

        protected String m_UniqueID
        the string or numeric attribute to use as unique identifier for rows.
      • m_KeepOnlySingleUniqueID

        protected boolean m_KeepOnlySingleUniqueID
        whether to keep only a single instance of the unique ID attribute.
      • m_Strict

        protected boolean m_Strict
        whether to fail if IDs not unique.
      • m_AttType

        protected int m_AttType
        the attribute type of the ID attribute.
      • m_UniqueIDAtts

        protected List<String> m_UniqueIDAtts
        the unique ID attributes.
    • Constructor Detail

      • WekaInstancesMerge

        public WekaInstancesMerge()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the object.
        Specified by:
        globalInfo in interface adams.core.GlobalInfoSupporter
        Specified by:
        globalInfo in class adams.core.option.AbstractOptionHandler
        Returns:
        a description suitable for displaying in the gui
      • getClassCrossReferences

        public Class[] getClassCrossReferences()
        Returns the cross-referenced classes.
        Specified by:
        getClassCrossReferences in interface adams.core.ClassCrossReference
        Returns:
        the classes
      • defineOptions

        public void defineOptions()
        Adds options to the internal list of options.
        Specified by:
        defineOptions in interface adams.core.option.OptionHandler
        Overrides:
        defineOptions in class adams.flow.core.AbstractActor
      • setRemove

        public void setRemove​(boolean value)
        Sets whether to remove if not all present
        Parameters:
        value - if true then remove instance if not all there to merge
      • getRemove

        public boolean getRemove()
        Returns whether to remove if not all present
        Returns:
        if true then remove instance if not all there to merge
      • removeTipText

        public String removeTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setUsePrefix

        public void setUsePrefix​(boolean value)
        Sets whether to use prefixes.
        Parameters:
        value - if true then the attributes will get prefixed
      • getUsePrefix

        public boolean getUsePrefix()
        Returns whether to use prefixes.
        Returns:
        true if the attributes will get prefixed
      • usePrefixTipText

        public String usePrefixTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setAddIndex

        public void setAddIndex​(boolean value)
        Sets whether to add the dataset index number to the prefix.
        Parameters:
        value - if true then the index will be used in the prefix
      • getAddIndex

        public boolean getAddIndex()
        Returns whether to add the dataset index number to the prefix.
        Returns:
        true if the index will be used in the prefix
      • addIndexTipText

        public String addIndexTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setPrefix

        public void setPrefix​(String value)
        Sets the optional prefix string.
        Parameters:
        value - the optional prefix string
      • getPrefix

        public String getPrefix()
        Returns the optional prefix string.
        Returns:
        the optional prefix string
      • prefixTipText

        public String prefixTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setPrefixSeparator

        public void setPrefixSeparator​(String value)
        Sets the prefix separator string.
        Parameters:
        value - the prefix separator string
      • getPrefixSeparator

        public String getPrefixSeparator()
        Returns the prefix separator string.
        Returns:
        the prefix separator string
      • prefixSeparatorTipText

        public String prefixSeparatorTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setExcludedAttributes

        public void setExcludedAttributes​(String value)
        Sets the regular expression for excluding attributes.
        Parameters:
        value - the regular expression
      • getExcludedAttributes

        public String getExcludedAttributes()
        Returns the prefix separator string.
        Returns:
        the prefix separator string
      • excludedAttributesTipText

        public String excludedAttributesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setInvertMatchingSense

        public void setInvertMatchingSense​(boolean value)
        Sets whether to invert the matching sense.
        Parameters:
        value - if true then matching sense gets inverted
      • getInvertMatchingSense

        public boolean getInvertMatchingSense()
        Returns whether to invert the matching sense.
        Returns:
        true if the attributes will get prefixed
      • invertMatchingSenseTipText

        public String invertMatchingSenseTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setUniqueID

        public void setUniqueID​(String value)
        Sets the attribute (string/numeric) to use for uniquely identifying rows.
        Parameters:
        value - the attribute name
      • getUniqueID

        public String getUniqueID()
        Returns the attribute (string/numeric) to use for uniquely identifying rows.
        Returns:
        the attribute name
      • uniqueIDTipText

        public String uniqueIDTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setKeepOnlySingleUniqueID

        public void setKeepOnlySingleUniqueID​(boolean value)
        Sets whether to keep only a single instance of the unique ID attribute.
        Parameters:
        value - true if to keep only single instance
      • getKeepOnlySingleUniqueID

        public boolean getKeepOnlySingleUniqueID()
        Returns whether to keep only a single instance of the unique ID attribute.
        Returns:
        true if to keep only single instance
      • keepOnlySingleUniqueIDTipText

        public String keepOnlySingleUniqueIDTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setStrict

        public void setStrict​(boolean value)
        Sets whether to enforce uniqueness in IDs.
        Parameters:
        value - true if to enforce
      • getStrict

        public boolean getStrict()
        Returns whether to enforce uniqueness in IDs.
        Returns:
        true if to enforce
      • strictTipText

        public String strictTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getQuickInfo

        public String getQuickInfo()
        Returns a quick info about the actor, which will be displayed in the GUI.
        Specified by:
        getQuickInfo in interface adams.flow.core.Actor
        Specified by:
        getQuickInfo in interface adams.core.QuickInfoSupporter
        Overrides:
        getQuickInfo in class adams.flow.core.AbstractActor
        Returns:
        null if no info available, otherwise short string
      • accepts

        public Class[] accepts()
        Returns the class that the consumer accepts.
        Specified by:
        accepts in interface adams.flow.core.InputConsumer
        Returns:
        java.lang.String[].class, java.io.File[].class, weka.core.Instance[].class, weka.core.Instances[].class
      • generates

        public Class[] generates()
        Returns the class of objects that it generates.
        Specified by:
        generates in interface adams.flow.core.OutputProducer
        Returns:
        weka.core.Instances.class
      • excludeAttributes

        protected weka.core.Instances excludeAttributes​(weka.core.Instances inst)
        Excludes attributes from the data.
        Parameters:
        inst - the data to process
        Returns:
        the processed data
      • createPrefix

        protected String createPrefix​(weka.core.Instances inst,
                                      int index)
        Generates the prefix for the dataset/index.
        Parameters:
        inst - the current dataset
        index - the index
        Returns:
        the prefix
      • prefixAttributes

        protected weka.core.Instances prefixAttributes​(weka.core.Instances inst,
                                                       int index)
        Prefixes the attributes.
        Parameters:
        index - the index of the dataset
        inst - the data to process
        Returns:
        the processed data
      • prepareData

        protected weka.core.Instances prepareData​(weka.core.Instances inst,
                                                  int index)
        Prepares the data, prefixing attributes, removing columns, etc, before merging it.
        Parameters:
        inst - the data to process
        index - the 0-based index of the dataset being processed
        Returns:
        the prepared data
      • updateIDs

        protected void updateIDs​(int instIndex,
                                 weka.core.Instances inst,
                                 HashSet ids)
        Updates the IDs in the hashset with the ones stored in the ID attribute of the provided dataset.
        Parameters:
        instIndex - the dataset index
        inst - the dataset to obtain the IDs from
        ids - the hashset to store the IDs in
      • merge

        protected weka.core.Instances merge​(weka.core.Instances[] orig,
                                            weka.core.Instances[] inst,
                                            HashSet ids)
        Merges the datasets based on the collected IDs.
        Parameters:
        orig - the original datasets
        inst - the processed datasets to merge into one
        ids - the IDs for identifying the rows
        Returns:
        the merged dataset
      • doExecute

        protected String doExecute()
        Executes the flow item.
        Specified by:
        doExecute in class adams.flow.core.AbstractActor
        Returns:
        null if everything is fine, otherwise error message