Package adams.flow.transformer
Class WekaInstancesMerge
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.flow.core.AbstractActor
-
- adams.flow.transformer.AbstractTransformer
-
- adams.flow.transformer.WekaInstancesMerge
-
- All Implemented Interfaces:
adams.core.AdditionalInformationHandler
,adams.core.ClassCrossReference
,adams.core.CleanUpHandler
,adams.core.CrossReference
,adams.core.Destroyable
,adams.core.GlobalInfoSupporter
,adams.core.logging.LoggingLevelHandler
,adams.core.logging.LoggingSupporter
,adams.core.option.OptionHandler
,adams.core.QuickInfoSupporter
,adams.core.ShallowCopySupporter<adams.flow.core.Actor>
,adams.core.SizeOfHandler
,adams.core.Stoppable
,adams.core.StoppableWithFeedback
,adams.core.VariablesInspectionHandler
,adams.event.VariableChangeListener
,adams.flow.core.Actor
,adams.flow.core.ErrorHandler
,adams.flow.core.InputConsumer
,adams.flow.core.OutputProducer
,WekaMergeInstancesActor
,Serializable
,Comparable
public class WekaInstancesMerge extends adams.flow.transformer.AbstractTransformer implements WekaMergeInstancesActor, adams.core.ClassCrossReference
Merges multiple datasets, either from file or using Instances/Instance objects.
If no 'ID' attribute is named, then all datasets must contain the same number of rows.
Attributes can be excluded from ending up in the final dataset via a regular expression. They can also be prefixed with name and/or index.
Input/output:
- accepts:
java.lang.String[]
java.io.File[]
weka.core.Instance[]
weka.core.Instances[]
- generates:
weka.core.Instances
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-name <java.lang.String> (property: name) The name of the actor. default: WekaInstancesMerge
-annotation <adams.core.base.BaseAnnotation> (property: annotations) The annotations to attach to this actor. default:
-skip <boolean> (property: skip) If set to true, transformation is skipped and the input token is just forwarded as it is. default: false
-stop-flow-on-error <boolean> (property: stopFlowOnError) If set to true, the flow execution at this level gets stopped in case this actor encounters an error; the error gets propagated; useful for critical actors. default: false
-silent <boolean> (property: silent) If enabled, then no errors are output in the console; Note: the enclosing actor handler must have this enabled as well. default: false
-use-prefix <boolean> (property: usePrefix) Whether to prefix the attribute names of each dataset with an index and an optional string. default: false
-add-index <boolean> (property: addIndex) Whether to add the index of the dataset to the prefix. default: false
-remove <boolean> (property: remove) If true, only keep instances where data is available from each source. default: false
-prefix <java.lang.String> (property: prefix) The optional prefix string to prefix the index number with (in case prefixes are used); '@' is a placeholder for the relation name. default: dataset
-prefix-separator <java.lang.String> (property: prefixSeparator) The separator string between the generated prefix and the original attribute name. default: -
-exclude-atts <java.lang.String> (property: excludedAttributes) The regular expression used on the attribute names, to determine whether an attribute should be excluded or not (matching sense can be inverted); leave empty to include all attributes. default:
-invert <boolean> (property: invertMatchingSense) Whether to invert the matching sense of excluding attributes, ie, the regular expression is used for including attributes. default: false
-unique-id <java.lang.String> (property: uniqueID) The name of the attribute (string/numeric) used for uniquely identifying rows among the datasets. default:
-keep-only-single-unique-id <boolean> (property: keepOnlySingleUniqueID) If enabled, only a single instance of the unique ID attribute is kept. default: false
-strict <boolean> (property: strict) If enabled, ensures that IDs in unique ID column are truly unique. default: false
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
m_AddIndex
whether to add the index to the prefix.protected int
m_AttType
the attribute type of the ID attribute.protected String
m_ExcludedAttributes
regular expression for excluding attributes from the datasets.protected boolean
m_InvertMatchingSense
whether to invert the matching sense for excluding attributes.protected boolean
m_KeepOnlySingleUniqueID
whether to keep only a single instance of the unique ID attribute.protected String
m_Prefix
the additional prefix name to use, apart from the index.protected String
m_PrefixSeparator
the separator between index and actual attribute name.protected boolean
m_Remove
whether to remove when not all present.protected boolean
m_Strict
whether to fail if IDs not unique.protected String
m_UniqueID
the string or numeric attribute to use as unique identifier for rows.protected List<String>
m_UniqueIDAtts
the unique ID attributes.protected boolean
m_UsePrefix
whether to prefix the attribute names of each dataset with an index.-
Fields inherited from class adams.flow.transformer.AbstractTransformer
BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
-
Fields inherited from class adams.flow.core.AbstractActor
m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_Executing, m_ExecutionListeningSupporter, m_FullName, m_LoggingPrefix, m_Name, m_Parent, m_ScopeHandler, m_Self, m_Silent, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
-
-
Constructor Summary
Constructors Constructor Description WekaInstancesMerge()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Class[]
accepts()
Returns the class that the consumer accepts.String
addIndexTipText()
Returns the tip text for this property.protected String
createPrefix(weka.core.Instances inst, int index)
Generates the prefix for the dataset/index.void
defineOptions()
Adds options to the internal list of options.protected String
doExecute()
Executes the flow item.protected weka.core.Instances
excludeAttributes(weka.core.Instances inst)
Excludes attributes from the data.String
excludedAttributesTipText()
Returns the tip text for this property.Class[]
generates()
Returns the class of objects that it generates.boolean
getAddIndex()
Returns whether to add the dataset index number to the prefix.Class[]
getClassCrossReferences()
Returns the cross-referenced classes.String
getExcludedAttributes()
Returns the prefix separator string.boolean
getInvertMatchingSense()
Returns whether to invert the matching sense.boolean
getKeepOnlySingleUniqueID()
Returns whether to keep only a single instance of the unique ID attribute.String
getPrefix()
Returns the optional prefix string.String
getPrefixSeparator()
Returns the prefix separator string.String
getQuickInfo()
Returns a quick info about the actor, which will be displayed in the GUI.boolean
getRemove()
Returns whether to remove if not all presentboolean
getStrict()
Returns whether to enforce uniqueness in IDs.String
getUniqueID()
Returns the attribute (string/numeric) to use for uniquely identifying rows.boolean
getUsePrefix()
Returns whether to use prefixes.String
globalInfo()
Returns a string describing the object.String
invertMatchingSenseTipText()
Returns the tip text for this property.String
keepOnlySingleUniqueIDTipText()
Returns the tip text for this property.protected weka.core.Instances
merge(weka.core.Instances[] orig, weka.core.Instances[] inst, HashSet ids)
Merges the datasets based on the collected IDs.protected weka.core.Instances
prefixAttributes(weka.core.Instances inst, int index)
Prefixes the attributes.String
prefixSeparatorTipText()
Returns the tip text for this property.String
prefixTipText()
Returns the tip text for this property.protected weka.core.Instances
prepareData(weka.core.Instances inst, int index)
Prepares the data, prefixing attributes, removing columns, etc, before merging it.String
removeTipText()
Returns the tip text for this property.void
setAddIndex(boolean value)
Sets whether to add the dataset index number to the prefix.void
setExcludedAttributes(String value)
Sets the regular expression for excluding attributes.void
setInvertMatchingSense(boolean value)
Sets whether to invert the matching sense.void
setKeepOnlySingleUniqueID(boolean value)
Sets whether to keep only a single instance of the unique ID attribute.void
setPrefix(String value)
Sets the optional prefix string.void
setPrefixSeparator(String value)
Sets the prefix separator string.void
setRemove(boolean value)
Sets whether to remove if not all presentvoid
setStrict(boolean value)
Sets whether to enforce uniqueness in IDs.void
setUniqueID(String value)
Sets the attribute (string/numeric) to use for uniquely identifying rows.void
setUsePrefix(boolean value)
Sets whether to use prefixes.String
strictTipText()
Returns the tip text for this property.String
uniqueIDTipText()
Returns the tip text for this property.protected void
updateIDs(int instIndex, weka.core.Instances inst, HashSet ids)
Updates the IDs in the hashset with the ones stored in the ID attribute of the provided dataset.String
usePrefixTipText()
Returns the tip text for this property.-
Methods inherited from class adams.flow.transformer.AbstractTransformer
backupState, currentInput, execute, hasInput, hasPendingOutput, input, output, postExecute, restoreState, wrapUp
-
Methods inherited from class adams.flow.core.AbstractActor
annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, configureLogger, destroy, equals, finalUpdateVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, handleException, hasErrorHandler, hasStopMessage, index, initialize, isBackedUp, isExecuted, isExecuting, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, performVariableChecks, preExecute, pruneBackup, pruneBackup, reset, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, silentTipText, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, updateVariables, variableChanged
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.flow.core.Actor
cleanUp, compareTo, destroy, equals, execute, findVariables, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isExecuted, isFinished, isHeadless, isStopped, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, sizeOf, stopExecution, stopExecution, toCommandLine, variableChanged, wrapUp
-
-
-
-
Field Detail
-
m_UsePrefix
protected boolean m_UsePrefix
whether to prefix the attribute names of each dataset with an index.
-
m_AddIndex
protected boolean m_AddIndex
whether to add the index to the prefix.
-
m_Remove
protected boolean m_Remove
whether to remove when not all present.
-
m_Prefix
protected String m_Prefix
the additional prefix name to use, apart from the index.
-
m_PrefixSeparator
protected String m_PrefixSeparator
the separator between index and actual attribute name.
-
m_ExcludedAttributes
protected String m_ExcludedAttributes
regular expression for excluding attributes from the datasets.
-
m_InvertMatchingSense
protected boolean m_InvertMatchingSense
whether to invert the matching sense for excluding attributes.
-
m_UniqueID
protected String m_UniqueID
the string or numeric attribute to use as unique identifier for rows.
-
m_KeepOnlySingleUniqueID
protected boolean m_KeepOnlySingleUniqueID
whether to keep only a single instance of the unique ID attribute.
-
m_Strict
protected boolean m_Strict
whether to fail if IDs not unique.
-
m_AttType
protected int m_AttType
the attribute type of the ID attribute.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceadams.core.GlobalInfoSupporter
- Specified by:
globalInfo
in classadams.core.option.AbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
getClassCrossReferences
public Class[] getClassCrossReferences()
Returns the cross-referenced classes.- Specified by:
getClassCrossReferences
in interfaceadams.core.ClassCrossReference
- Returns:
- the classes
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceadams.core.option.OptionHandler
- Overrides:
defineOptions
in classadams.flow.core.AbstractActor
-
setRemove
public void setRemove(boolean value)
Sets whether to remove if not all present- Parameters:
value
- if true then remove instance if not all there to merge
-
getRemove
public boolean getRemove()
Returns whether to remove if not all present- Returns:
- if true then remove instance if not all there to merge
-
removeTipText
public String removeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setUsePrefix
public void setUsePrefix(boolean value)
Sets whether to use prefixes.- Parameters:
value
- if true then the attributes will get prefixed
-
getUsePrefix
public boolean getUsePrefix()
Returns whether to use prefixes.- Returns:
- true if the attributes will get prefixed
-
usePrefixTipText
public String usePrefixTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setAddIndex
public void setAddIndex(boolean value)
Sets whether to add the dataset index number to the prefix.- Parameters:
value
- if true then the index will be used in the prefix
-
getAddIndex
public boolean getAddIndex()
Returns whether to add the dataset index number to the prefix.- Returns:
- true if the index will be used in the prefix
-
addIndexTipText
public String addIndexTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setPrefix
public void setPrefix(String value)
Sets the optional prefix string.- Parameters:
value
- the optional prefix string
-
getPrefix
public String getPrefix()
Returns the optional prefix string.- Returns:
- the optional prefix string
-
prefixTipText
public String prefixTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setPrefixSeparator
public void setPrefixSeparator(String value)
Sets the prefix separator string.- Parameters:
value
- the prefix separator string
-
getPrefixSeparator
public String getPrefixSeparator()
Returns the prefix separator string.- Returns:
- the prefix separator string
-
prefixSeparatorTipText
public String prefixSeparatorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setExcludedAttributes
public void setExcludedAttributes(String value)
Sets the regular expression for excluding attributes.- Parameters:
value
- the regular expression
-
getExcludedAttributes
public String getExcludedAttributes()
Returns the prefix separator string.- Returns:
- the prefix separator string
-
excludedAttributesTipText
public String excludedAttributesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setInvertMatchingSense
public void setInvertMatchingSense(boolean value)
Sets whether to invert the matching sense.- Parameters:
value
- if true then matching sense gets inverted
-
getInvertMatchingSense
public boolean getInvertMatchingSense()
Returns whether to invert the matching sense.- Returns:
- true if the attributes will get prefixed
-
invertMatchingSenseTipText
public String invertMatchingSenseTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setUniqueID
public void setUniqueID(String value)
Sets the attribute (string/numeric) to use for uniquely identifying rows.- Parameters:
value
- the attribute name
-
getUniqueID
public String getUniqueID()
Returns the attribute (string/numeric) to use for uniquely identifying rows.- Returns:
- the attribute name
-
uniqueIDTipText
public String uniqueIDTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setKeepOnlySingleUniqueID
public void setKeepOnlySingleUniqueID(boolean value)
Sets whether to keep only a single instance of the unique ID attribute.- Parameters:
value
- true if to keep only single instance
-
getKeepOnlySingleUniqueID
public boolean getKeepOnlySingleUniqueID()
Returns whether to keep only a single instance of the unique ID attribute.- Returns:
- true if to keep only single instance
-
keepOnlySingleUniqueIDTipText
public String keepOnlySingleUniqueIDTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setStrict
public void setStrict(boolean value)
Sets whether to enforce uniqueness in IDs.- Parameters:
value
- true if to enforce
-
getStrict
public boolean getStrict()
Returns whether to enforce uniqueness in IDs.- Returns:
- true if to enforce
-
strictTipText
public String strictTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getQuickInfo
public String getQuickInfo()
Returns a quick info about the actor, which will be displayed in the GUI.- Specified by:
getQuickInfo
in interfaceadams.flow.core.Actor
- Specified by:
getQuickInfo
in interfaceadams.core.QuickInfoSupporter
- Overrides:
getQuickInfo
in classadams.flow.core.AbstractActor
- Returns:
- null if no info available, otherwise short string
-
accepts
public Class[] accepts()
Returns the class that the consumer accepts.- Specified by:
accepts
in interfaceadams.flow.core.InputConsumer
- Returns:
- java.lang.String[].class, java.io.File[].class, weka.core.Instance[].class, weka.core.Instances[].class
-
generates
public Class[] generates()
Returns the class of objects that it generates.- Specified by:
generates
in interfaceadams.flow.core.OutputProducer
- Returns:
- weka.core.Instances.class
-
excludeAttributes
protected weka.core.Instances excludeAttributes(weka.core.Instances inst)
Excludes attributes from the data.- Parameters:
inst
- the data to process- Returns:
- the processed data
-
createPrefix
protected String createPrefix(weka.core.Instances inst, int index)
Generates the prefix for the dataset/index.- Parameters:
inst
- the current datasetindex
- the index- Returns:
- the prefix
-
prefixAttributes
protected weka.core.Instances prefixAttributes(weka.core.Instances inst, int index)
Prefixes the attributes.- Parameters:
index
- the index of the datasetinst
- the data to process- Returns:
- the processed data
-
prepareData
protected weka.core.Instances prepareData(weka.core.Instances inst, int index)
Prepares the data, prefixing attributes, removing columns, etc, before merging it.- Parameters:
inst
- the data to processindex
- the 0-based index of the dataset being processed- Returns:
- the prepared data
-
updateIDs
protected void updateIDs(int instIndex, weka.core.Instances inst, HashSet ids)
Updates the IDs in the hashset with the ones stored in the ID attribute of the provided dataset.- Parameters:
instIndex
- the dataset indexinst
- the dataset to obtain the IDs fromids
- the hashset to store the IDs in
-
merge
protected weka.core.Instances merge(weka.core.Instances[] orig, weka.core.Instances[] inst, HashSet ids)
Merges the datasets based on the collected IDs.- Parameters:
orig
- the original datasetsinst
- the processed datasets to merge into oneids
- the IDs for identifying the rows- Returns:
- the merged dataset
-
doExecute
protected String doExecute()
Executes the flow item.- Specified by:
doExecute
in classadams.flow.core.AbstractActor
- Returns:
- null if everything is fine, otherwise error message
-
-