Package adams.flow.transformer
Class WekaInstancesMerge
-
- All Implemented Interfaces:
AdditionalInformationHandler,ClassCrossReference,CleanUpHandler,CrossReference,Destroyable,GlobalInfoSupporter,LoggingLevelHandler,LoggingSupporter,OptionHandler,QuickInfoSupporter,ShallowCopySupporter<Actor>,SizeOfHandler,Stoppable,StoppableWithFeedback,VariablesInspectionHandler,VariableChangeListener,Actor,ErrorHandler,InputConsumer,OutputProducer,WekaMergeInstancesActor,Serializable,Comparable
public class WekaInstancesMerge extends AbstractTransformer implements WekaMergeInstancesActor, ClassCrossReference
Merges multiple datasets, either from file or using Instances/Instance objects.
If no 'ID' attribute is named, then all datasets must contain the same number of rows.
Attributes can be excluded from ending up in the final dataset via a regular expression. They can also be prefixed with name and/or index.
Input/output:
- accepts:
java.lang.String[]
java.io.File[]
weka.core.Instance[]
weka.core.Instances[]
- generates:
weka.core.Instances
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-name <java.lang.String> (property: name) The name of the actor. default: WekaInstancesMerge
-annotation <adams.core.base.BaseAnnotation> (property: annotations) The annotations to attach to this actor. default:
-skip <boolean> (property: skip) If set to true, transformation is skipped and the input token is just forwarded as it is. default: false
-stop-flow-on-error <boolean> (property: stopFlowOnError) If set to true, the flow execution at this level gets stopped in case this actor encounters an error; the error gets propagated; useful for critical actors. default: false
-silent <boolean> (property: silent) If enabled, then no errors are output in the console; Note: the enclosing actor handler must have this enabled as well. default: false
-use-prefix <boolean> (property: usePrefix) Whether to prefix the attribute names of each dataset with an index and an optional string. default: false
-add-index <boolean> (property: addIndex) Whether to add the index of the dataset to the prefix. default: false
-remove <boolean> (property: remove) If true, only keep instances where data is available from each source. default: false
-prefix <java.lang.String> (property: prefix) The optional prefix string to prefix the index number with (in case prefixes are used); '@' is a placeholder for the relation name. default: dataset
-prefix-separator <java.lang.String> (property: prefixSeparator) The separator string between the generated prefix and the original attribute name. default: -
-exclude-atts <java.lang.String> (property: excludedAttributes) The regular expression used on the attribute names, to determine whether an attribute should be excluded or not (matching sense can be inverted); leave empty to include all attributes. default:
-invert <boolean> (property: invertMatchingSense) Whether to invert the matching sense of excluding attributes, ie, the regular expression is used for including attributes. default: false
-unique-id <java.lang.String> (property: uniqueID) The name of the attribute (string/numeric) used for uniquely identifying rows among the datasets. default:
-keep-only-single-unique-id <boolean> (property: keepOnlySingleUniqueID) If enabled, only a single instance of the unique ID attribute is kept. default: false
-strict <boolean> (property: strict) If enabled, ensures that IDs in unique ID column are truly unique. default: false
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected booleanm_AddIndexwhether to add the index to the prefix.protected intm_AttTypethe attribute type of the ID attribute.protected Stringm_ExcludedAttributesregular expression for excluding attributes from the datasets.protected booleanm_InvertMatchingSensewhether to invert the matching sense for excluding attributes.protected booleanm_KeepOnlySingleUniqueIDwhether to keep only a single instance of the unique ID attribute.protected Stringm_Prefixthe additional prefix name to use, apart from the index.protected Stringm_PrefixSeparatorthe separator between index and actual attribute name.protected booleanm_Removewhether to remove when not all present.protected booleanm_Strictwhether to fail if IDs not unique.protected Stringm_UniqueIDthe string or numeric attribute to use as unique identifier for rows.protected List<String>m_UniqueIDAttsthe unique ID attributes.protected booleanm_UsePrefixwhether to prefix the attribute names of each dataset with an index.-
Fields inherited from class adams.flow.transformer.AbstractTransformer
BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
-
Fields inherited from class adams.flow.core.AbstractActor
m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_Executing, m_ExecutionListeningSupporter, m_FullName, m_LoggingPrefix, m_Name, m_Parent, m_ScopeHandler, m_Self, m_Silent, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
Fields inherited from interface adams.flow.core.Actor
FILE_EXTENSION, FILE_EXTENSION_GZ
-
-
Constructor Summary
Constructors Constructor Description WekaInstancesMerge()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Class[]accepts()Returns the class that the consumer accepts.StringaddIndexTipText()Returns the tip text for this property.protected StringcreatePrefix(weka.core.Instances inst, int index)Generates the prefix for the dataset/index.voiddefineOptions()Adds options to the internal list of options.protected StringdoExecute()Executes the flow item.protected weka.core.InstancesexcludeAttributes(weka.core.Instances inst)Excludes attributes from the data.StringexcludedAttributesTipText()Returns the tip text for this property.Class[]generates()Returns the class of objects that it generates.booleangetAddIndex()Returns whether to add the dataset index number to the prefix.Class[]getClassCrossReferences()Returns the cross-referenced classes.StringgetExcludedAttributes()Returns the prefix separator string.booleangetInvertMatchingSense()Returns whether to invert the matching sense.booleangetKeepOnlySingleUniqueID()Returns whether to keep only a single instance of the unique ID attribute.StringgetPrefix()Returns the optional prefix string.StringgetPrefixSeparator()Returns the prefix separator string.StringgetQuickInfo()Returns a quick info about the actor, which will be displayed in the GUI.booleangetRemove()Returns whether to remove if not all presentbooleangetStrict()Returns whether to enforce uniqueness in IDs.StringgetUniqueID()Returns the attribute (string/numeric) to use for uniquely identifying rows.booleangetUsePrefix()Returns whether to use prefixes.StringglobalInfo()Returns a string describing the object.StringinvertMatchingSenseTipText()Returns the tip text for this property.StringkeepOnlySingleUniqueIDTipText()Returns the tip text for this property.protected weka.core.Instancesmerge(weka.core.Instances[] orig, weka.core.Instances[] inst, HashSet ids)Merges the datasets based on the collected IDs.protected weka.core.InstancesprefixAttributes(weka.core.Instances inst, int index)Prefixes the attributes.StringprefixSeparatorTipText()Returns the tip text for this property.StringprefixTipText()Returns the tip text for this property.protected weka.core.InstancesprepareData(weka.core.Instances inst, int index)Prepares the data, prefixing attributes, removing columns, etc, before merging it.StringremoveTipText()Returns the tip text for this property.voidsetAddIndex(boolean value)Sets whether to add the dataset index number to the prefix.voidsetExcludedAttributes(String value)Sets the regular expression for excluding attributes.voidsetInvertMatchingSense(boolean value)Sets whether to invert the matching sense.voidsetKeepOnlySingleUniqueID(boolean value)Sets whether to keep only a single instance of the unique ID attribute.voidsetPrefix(String value)Sets the optional prefix string.voidsetPrefixSeparator(String value)Sets the prefix separator string.voidsetRemove(boolean value)Sets whether to remove if not all presentvoidsetStrict(boolean value)Sets whether to enforce uniqueness in IDs.voidsetUniqueID(String value)Sets the attribute (string/numeric) to use for uniquely identifying rows.voidsetUsePrefix(boolean value)Sets whether to use prefixes.StringstrictTipText()Returns the tip text for this property.StringuniqueIDTipText()Returns the tip text for this property.protected voidupdateIDs(int instIndex, weka.core.Instances inst, HashSet ids)Updates the IDs in the hashset with the ones stored in the ID attribute of the provided dataset.StringusePrefixTipText()Returns the tip text for this property.-
Methods inherited from class adams.flow.transformer.AbstractTransformer
backupState, currentInput, execute, hasInput, hasPendingOutput, input, output, postExecute, restoreState, wrapUp
-
Methods inherited from class adams.flow.core.AbstractActor
annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, configureLogger, destroy, equals, finalUpdateVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, handleException, hasErrorHandler, hasStopMessage, index, initialize, isBackedUp, isExecuted, isExecuting, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, performVariableChecks, preExecute, pruneBackup, pruneBackup, reset, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, silentTipText, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, updateVariables, variableChanged
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.flow.core.Actor
cleanUp, compareTo, destroy, equals, execute, findVariables, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isExecuted, isFinished, isHeadless, isStopped, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, sizeOf, stopExecution, stopExecution, toCommandLine, variableChanged, wrapUp
-
Methods inherited from interface adams.core.AdditionalInformationHandler
getAdditionalInformation
-
Methods inherited from interface adams.flow.core.InputConsumer
currentInput, hasInput, input
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel, setLoggingLevel
-
Methods inherited from interface adams.core.logging.LoggingSupporter
getLogger, isLoggingEnabled
-
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager
-
Methods inherited from interface adams.flow.core.OutputProducer
hasPendingOutput, output
-
Methods inherited from interface adams.core.VariablesInspectionHandler
canInspectOptions
-
-
-
-
Field Detail
-
m_UsePrefix
protected boolean m_UsePrefix
whether to prefix the attribute names of each dataset with an index.
-
m_AddIndex
protected boolean m_AddIndex
whether to add the index to the prefix.
-
m_Remove
protected boolean m_Remove
whether to remove when not all present.
-
m_Prefix
protected String m_Prefix
the additional prefix name to use, apart from the index.
-
m_PrefixSeparator
protected String m_PrefixSeparator
the separator between index and actual attribute name.
-
m_ExcludedAttributes
protected String m_ExcludedAttributes
regular expression for excluding attributes from the datasets.
-
m_InvertMatchingSense
protected boolean m_InvertMatchingSense
whether to invert the matching sense for excluding attributes.
-
m_UniqueID
protected String m_UniqueID
the string or numeric attribute to use as unique identifier for rows.
-
m_KeepOnlySingleUniqueID
protected boolean m_KeepOnlySingleUniqueID
whether to keep only a single instance of the unique ID attribute.
-
m_Strict
protected boolean m_Strict
whether to fail if IDs not unique.
-
m_AttType
protected int m_AttType
the attribute type of the ID attribute.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfoin interfaceGlobalInfoSupporter- Specified by:
globalInfoin classAbstractOptionHandler- Returns:
- a description suitable for displaying in the gui
-
getClassCrossReferences
public Class[] getClassCrossReferences()
Returns the cross-referenced classes.- Specified by:
getClassCrossReferencesin interfaceClassCrossReference- Returns:
- the classes
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptionsin interfaceOptionHandler- Overrides:
defineOptionsin classAbstractActor
-
setRemove
public void setRemove(boolean value)
Sets whether to remove if not all present- Parameters:
value- if true then remove instance if not all there to merge
-
getRemove
public boolean getRemove()
Returns whether to remove if not all present- Returns:
- if true then remove instance if not all there to merge
-
removeTipText
public String removeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setUsePrefix
public void setUsePrefix(boolean value)
Sets whether to use prefixes.- Parameters:
value- if true then the attributes will get prefixed
-
getUsePrefix
public boolean getUsePrefix()
Returns whether to use prefixes.- Returns:
- true if the attributes will get prefixed
-
usePrefixTipText
public String usePrefixTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setAddIndex
public void setAddIndex(boolean value)
Sets whether to add the dataset index number to the prefix.- Parameters:
value- if true then the index will be used in the prefix
-
getAddIndex
public boolean getAddIndex()
Returns whether to add the dataset index number to the prefix.- Returns:
- true if the index will be used in the prefix
-
addIndexTipText
public String addIndexTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setPrefix
public void setPrefix(String value)
Sets the optional prefix string.- Parameters:
value- the optional prefix string
-
getPrefix
public String getPrefix()
Returns the optional prefix string.- Returns:
- the optional prefix string
-
prefixTipText
public String prefixTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setPrefixSeparator
public void setPrefixSeparator(String value)
Sets the prefix separator string.- Parameters:
value- the prefix separator string
-
getPrefixSeparator
public String getPrefixSeparator()
Returns the prefix separator string.- Returns:
- the prefix separator string
-
prefixSeparatorTipText
public String prefixSeparatorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setExcludedAttributes
public void setExcludedAttributes(String value)
Sets the regular expression for excluding attributes.- Parameters:
value- the regular expression
-
getExcludedAttributes
public String getExcludedAttributes()
Returns the prefix separator string.- Returns:
- the prefix separator string
-
excludedAttributesTipText
public String excludedAttributesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setInvertMatchingSense
public void setInvertMatchingSense(boolean value)
Sets whether to invert the matching sense.- Parameters:
value- if true then matching sense gets inverted
-
getInvertMatchingSense
public boolean getInvertMatchingSense()
Returns whether to invert the matching sense.- Returns:
- true if the attributes will get prefixed
-
invertMatchingSenseTipText
public String invertMatchingSenseTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setUniqueID
public void setUniqueID(String value)
Sets the attribute (string/numeric) to use for uniquely identifying rows.- Parameters:
value- the attribute name
-
getUniqueID
public String getUniqueID()
Returns the attribute (string/numeric) to use for uniquely identifying rows.- Returns:
- the attribute name
-
uniqueIDTipText
public String uniqueIDTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setKeepOnlySingleUniqueID
public void setKeepOnlySingleUniqueID(boolean value)
Sets whether to keep only a single instance of the unique ID attribute.- Parameters:
value- true if to keep only single instance
-
getKeepOnlySingleUniqueID
public boolean getKeepOnlySingleUniqueID()
Returns whether to keep only a single instance of the unique ID attribute.- Returns:
- true if to keep only single instance
-
keepOnlySingleUniqueIDTipText
public String keepOnlySingleUniqueIDTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setStrict
public void setStrict(boolean value)
Sets whether to enforce uniqueness in IDs.- Parameters:
value- true if to enforce
-
getStrict
public boolean getStrict()
Returns whether to enforce uniqueness in IDs.- Returns:
- true if to enforce
-
strictTipText
public String strictTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getQuickInfo
public String getQuickInfo()
Returns a quick info about the actor, which will be displayed in the GUI.- Specified by:
getQuickInfoin interfaceActor- Specified by:
getQuickInfoin interfaceQuickInfoSupporter- Overrides:
getQuickInfoin classAbstractActor- Returns:
- null if no info available, otherwise short string
-
accepts
public Class[] accepts()
Returns the class that the consumer accepts.- Specified by:
acceptsin interfaceInputConsumer- Returns:
- java.lang.String[].class, java.io.File[].class, weka.core.Instance[].class, weka.core.Instances[].class
-
generates
public Class[] generates()
Returns the class of objects that it generates.- Specified by:
generatesin interfaceOutputProducer- Returns:
- weka.core.Instances.class
-
excludeAttributes
protected weka.core.Instances excludeAttributes(weka.core.Instances inst)
Excludes attributes from the data.- Parameters:
inst- the data to process- Returns:
- the processed data
-
createPrefix
protected String createPrefix(weka.core.Instances inst, int index)
Generates the prefix for the dataset/index.- Parameters:
inst- the current datasetindex- the index- Returns:
- the prefix
-
prefixAttributes
protected weka.core.Instances prefixAttributes(weka.core.Instances inst, int index)Prefixes the attributes.- Parameters:
index- the index of the datasetinst- the data to process- Returns:
- the processed data
-
prepareData
protected weka.core.Instances prepareData(weka.core.Instances inst, int index)Prepares the data, prefixing attributes, removing columns, etc, before merging it.- Parameters:
inst- the data to processindex- the 0-based index of the dataset being processed- Returns:
- the prepared data
-
updateIDs
protected void updateIDs(int instIndex, weka.core.Instances inst, HashSet ids)Updates the IDs in the hashset with the ones stored in the ID attribute of the provided dataset.- Parameters:
instIndex- the dataset indexinst- the dataset to obtain the IDs fromids- the hashset to store the IDs in
-
merge
protected weka.core.Instances merge(weka.core.Instances[] orig, weka.core.Instances[] inst, HashSet ids)Merges the datasets based on the collected IDs.- Parameters:
orig- the original datasetsinst- the processed datasets to merge into oneids- the IDs for identifying the rows- Returns:
- the merged dataset
-
doExecute
protected String doExecute()
Executes the flow item.- Specified by:
doExecutein classAbstractActor- Returns:
- null if everything is fine, otherwise error message
-
-