adams.flow.transformer
Class WekaInstancesMerge

java.lang.Object
  extended by adams.core.ConsoleObject
      extended by adams.core.option.AbstractOptionHandler
          extended by adams.flow.core.AbstractActor
              extended by adams.flow.transformer.AbstractTransformer
                  extended by adams.flow.transformer.WekaInstancesMerge
All Implemented Interfaces:
AdditionalInformationHandler, CleanUpHandler, Debuggable, DebugOutputHandler, Destroyable, OptionHandler, QuickInfoSupporter, ShallowCopySupporter<AbstractActor>, SizeOfHandler, Stoppable, VariableChangeListener, ErrorHandler, InputConsumer, OutputProducer, ProvenanceSupporter, Serializable, Comparable

public class WekaInstancesMerge
extends AbstractTransformer
implements ProvenanceSupporter

Merges multiple datasets.
If no 'ID' attribute is named, then all datasets must contain the same number of rows.
Attributes can be excluded from ending up in the final dataset via a regular expression. They can also be prefixed with name and/or index.

Input/output:
- accepts:
   java.lang.String
   java.lang.String[]
   java.io.File
   java.io.File[]
- generates:
   weka.core.Instances

Valid options are:

-D (property: debug)
    If set to true, scheme may output additional info to the console.
 
-name <java.lang.String> (property: name)
    The name of the actor.
    default: InstancesMerge
 
-annotation <adams.core.base.BaseText> (property: annotations)
    The annotations to attach to this actor.
    default:
 
-skip (property: skip)
    If set to true, transformation is skipped and the input token is just forwarded
    as it is.
 
-use-prefix (property: usePrefix)
    Whether to prefix the attribute names of each dataset with an index and
    an optional string.
 
-add-index (property: addIndex)
    Whether to add the index of the dataset to the prefix.
 
-prefix <java.lang.String> (property: prefix)
    The optional prefix string to prefix the index number with (in case prefixes
    are used); '@' is a placeholder for the relation name.
    default: dataset
 
-prefix-separator <java.lang.String> (property: prefixSeparator)
    The separator string between the generated prefix and the original attribute
    name.
    default: -
 
-exclude-atts <java.lang.String> (property: excludedAttributes)
    The regular expression used on the attribute names, to determine whether
    an attribute should be excluded or not (matching sense can be inverted);
    leave empty to include all attributes.
    default:
 
-invert (property: invertMatchingSense)
    Whether to invert the matching sense of excluding attributes, ie, the regular
    expression is used for including attributes.
 
-unique-id <java.lang.String> (property: uniqueID)
    The name of the attribute (string/numeric) used for uniquely identifying
    rows among the datasets.
    default:
 

Version:
$Revision: 4584 $
Author:
fracpete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Field Summary
protected  boolean m_AddIndex
          whether to add the index to the prefix.
protected  int m_AttType
          the attribute type of the ID attribute.
protected  String m_ExcludedAttributes
          regular expression for excluding attributes from the datasets.
protected  boolean m_InvertMatchingSense
          whether to invert the matching sense for excluding attributes.
protected  String m_Prefix
          the additional prefix name to use, apart from the index.
protected  String m_PrefixSeparator
          the separator between index and actual attribute name.
protected  String m_UniqueID
          the string or numeric attribute to use as unique identifier for rows.
protected  boolean m_UsePrefix
          whether to prefix the attribute names of each dataset with an index.
 
Fields inherited from class adams.flow.transformer.AbstractTransformer
BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
 
Fields inherited from class adams.flow.core.AbstractActor
FILE_EXTENSION, FILE_EXTENSION_GZ, m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_FullName, m_Headless, m_Name, m_Parent, m_Root, m_Self, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
 
Fields inherited from class adams.core.option.AbstractOptionHandler
m_DebugLevel, m_OptionManager
 
Constructor Summary
WekaInstancesMerge()
           
 
Method Summary
 Class[] accepts()
          Returns the class that the consumer accepts.
 String addIndexTipText()
          Returns the tip text for this property.
 void defineOptions()
          Adds options to the internal list of options.
protected  String doExecute()
          Executes the flow item.
protected  weka.core.Instances excludeAttributes(weka.core.Instances inst, int index)
          Excludes attributes from the data.
 String excludedAttributesTipText()
          Returns the tip text for this property.
 Class[] generates()
          Returns the class of objects that it generates.
 boolean getAddIndex()
          Returns whether to add the dataset index number to the prefix.
 String getExcludedAttributes()
          Returns the prefix separator string.
 boolean getInvertMatchingSense()
          Returns whether to invert the matching sense.
 String getPrefix()
          Returns the optional prefix string.
 String getPrefixSeparator()
          Returns the prefix separator string.
 String getUniqueID()
          Returns the attribute (string/numeric) to use for uniquely identifying rows.
 boolean getUsePrefix()
          Returns whether to use prefixes.
 String globalInfo()
          Returns a string describing the object.
 String invertMatchingSenseTipText()
          Returns the tip text for this property.
protected  weka.core.Instances merge(weka.core.Instances[] orig, weka.core.Instances[] inst, HashSet ids)
          Merges the datasets based on the collected IDs.
protected  weka.core.Instances prefixAttributes(weka.core.Instances inst, int index)
          Prefixes the attributes.
 String prefixSeparatorTipText()
          Returns the tip text for this property.
 String prefixTipText()
          Returns the tip text for this property.
protected  weka.core.Instances prepareData(weka.core.Instances inst, int index)
          Prepares the data, prefixing attributes, removing columns, etc, before merging it.
 void setAddIndex(boolean value)
          Sets whether to add the dataset index number to the prefix.
 void setExcludedAttributes(String value)
          Sets the regular expression for excluding attributes.
 void setInvertMatchingSense(boolean value)
          Sets whether to invert the matching sense.
 void setPrefix(String value)
          Sets the optional prefix string.
 void setPrefixSeparator(String value)
          Sets the prefix separator string.
 void setUniqueID(String value)
          Sets the attribute (string/numeric) to use for uniquely identifying rows.
 void setUsePrefix(boolean value)
          Sets whether to use prefixes.
 String uniqueIDTipText()
          Returns the tip text for this property.
protected  void updateIDs(weka.core.Instances inst, HashSet ids)
          Updates the IDs in the hashset with the ones stored in the ID attribute of the provided dataset.
 void updateProvenance(ProvenanceContainer cont)
          Updates the provenance information in the provided container.
 String usePrefixTipText()
          Returns the tip text for this property.
 
Methods inherited from class adams.flow.transformer.AbstractTransformer
backupState, execute, hasPendingOutput, input, output, postExecute, reset, restoreState, wrapUp
 
Methods inherited from class adams.flow.core.AbstractActor
annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, debug, destroy, equals, findVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFullName, getName, getNextSibling, getParent, getPreviousSibling, getQuickInfo, getRoot, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, initialize, isBackedUp, isExecuted, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, preExecute, pruneBackup, pruneBackup, setAnnotations, setErrorHandler, setHeadless, setName, setParent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, updateVariables, variableChanged
 
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, debug, debugLevelTipText, finishInit, getDebugLevel, getOptionManager, isDebugOn, newOptionManager, setDebugLevel, toCommandLine, toString
 
Methods inherited from class adams.core.ConsoleObject
getDebugging, getSystemErr, getSystemOut
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_UsePrefix

protected boolean m_UsePrefix
whether to prefix the attribute names of each dataset with an index.


m_AddIndex

protected boolean m_AddIndex
whether to add the index to the prefix.


m_Prefix

protected String m_Prefix
the additional prefix name to use, apart from the index.


m_PrefixSeparator

protected String m_PrefixSeparator
the separator between index and actual attribute name.


m_ExcludedAttributes

protected String m_ExcludedAttributes
regular expression for excluding attributes from the datasets.


m_InvertMatchingSense

protected boolean m_InvertMatchingSense
whether to invert the matching sense for excluding attributes.


m_UniqueID

protected String m_UniqueID
the string or numeric attribute to use as unique identifier for rows.


m_AttType

protected int m_AttType
the attribute type of the ID attribute.

Constructor Detail

WekaInstancesMerge

public WekaInstancesMerge()
Method Detail

globalInfo

public String globalInfo()
Returns a string describing the object.

Specified by:
globalInfo in class AbstractOptionHandler
Returns:
a description suitable for displaying in the gui

defineOptions

public void defineOptions()
Adds options to the internal list of options.

Specified by:
defineOptions in interface OptionHandler
Overrides:
defineOptions in class AbstractActor

setUsePrefix

public void setUsePrefix(boolean value)
Sets whether to use prefixes.

Parameters:
value - if true then the attributes will get prefixed

getUsePrefix

public boolean getUsePrefix()
Returns whether to use prefixes.

Returns:
true if the attributes will get prefixed

usePrefixTipText

public String usePrefixTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setAddIndex

public void setAddIndex(boolean value)
Sets whether to add the dataset index number to the prefix.

Parameters:
value - if true then the index will be used in the prefix

getAddIndex

public boolean getAddIndex()
Returns whether to add the dataset index number to the prefix.

Returns:
true if the index will be used in the prefix

addIndexTipText

public String addIndexTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setPrefix

public void setPrefix(String value)
Sets the optional prefix string.

Parameters:
value - the optional prefix string

getPrefix

public String getPrefix()
Returns the optional prefix string.

Returns:
the optional prefix string

prefixTipText

public String prefixTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setPrefixSeparator

public void setPrefixSeparator(String value)
Sets the prefix separator string.

Parameters:
value - the prefix separator string

getPrefixSeparator

public String getPrefixSeparator()
Returns the prefix separator string.

Returns:
the prefix separator string

prefixSeparatorTipText

public String prefixSeparatorTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setExcludedAttributes

public void setExcludedAttributes(String value)
Sets the regular expression for excluding attributes.

Parameters:
value - the regular expression

getExcludedAttributes

public String getExcludedAttributes()
Returns the prefix separator string.

Returns:
the prefix separator string

excludedAttributesTipText

public String excludedAttributesTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setInvertMatchingSense

public void setInvertMatchingSense(boolean value)
Sets whether to invert the matching sense.

Parameters:
value - if true then matching sense gets inverted

getInvertMatchingSense

public boolean getInvertMatchingSense()
Returns whether to invert the matching sense.

Returns:
true if the attributes will get prefixed

invertMatchingSenseTipText

public String invertMatchingSenseTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setUniqueID

public void setUniqueID(String value)
Sets the attribute (string/numeric) to use for uniquely identifying rows.

Parameters:
value - the attribute name

getUniqueID

public String getUniqueID()
Returns the attribute (string/numeric) to use for uniquely identifying rows.

Returns:
the attribute name

uniqueIDTipText

public String uniqueIDTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

accepts

public Class[] accepts()
Returns the class that the consumer accepts.

Specified by:
accepts in interface InputConsumer
Returns:
java.lang.String.class, java.lang.String[].class, java.io.File.class, java.io.File[].class

generates

public Class[] generates()
Returns the class of objects that it generates.

Specified by:
generates in interface OutputProducer
Returns:
weka.core.Instances.class

excludeAttributes

protected weka.core.Instances excludeAttributes(weka.core.Instances inst,
                                                int index)
Excludes attributes from the data.

Parameters:
index - the index of the dataset
inst - the data to process
Returns:
the processed data

prefixAttributes

protected weka.core.Instances prefixAttributes(weka.core.Instances inst,
                                               int index)
Prefixes the attributes.

Parameters:
index - the index of the dataset
inst - the data to process
Returns:
the processed data

prepareData

protected weka.core.Instances prepareData(weka.core.Instances inst,
                                          int index)
Prepares the data, prefixing attributes, removing columns, etc, before merging it.

Parameters:
inst - the data to process
index - the 0-based index of the dataset being processed
Returns:
the prepared data

updateIDs

protected void updateIDs(weka.core.Instances inst,
                         HashSet ids)
Updates the IDs in the hashset with the ones stored in the ID attribute of the provided dataset.

Parameters:
inst - the dataset to obtain the IDs from
ids - the hashset to store the IDs in

merge

protected weka.core.Instances merge(weka.core.Instances[] orig,
                                    weka.core.Instances[] inst,
                                    HashSet ids)
Merges the datasets based on the collected IDs.

Parameters:
orig - the original datasets
inst - the processed datasets to merge into one
ids - the IDs for identifying the rows
Returns:
the merged dataset

doExecute

protected String doExecute()
Executes the flow item.

Specified by:
doExecute in class AbstractActor
Returns:
null if everything is fine, otherwise error message

updateProvenance

public void updateProvenance(ProvenanceContainer cont)
Updates the provenance information in the provided container.

Specified by:
updateProvenance in interface ProvenanceSupporter
Parameters:
cont - the provenance container to update


Copyright © 2012 University of Waikato, Hamilton, NZ. All Rights Reserved.