Package adams.flow.transformer
Class SpreadSheetMerge
-
- All Implemented Interfaces:
AdditionalInformationHandler
,ClassCrossReference
,CleanUpHandler
,CrossReference
,Destroyable
,GlobalInfoSupporter
,LoggingLevelHandler
,LoggingSupporter
,OptionHandler
,QuickInfoSupporter
,ShallowCopySupporter<Actor>
,SizeOfHandler
,Stoppable
,StoppableWithFeedback
,VariablesInspectionHandler
,VariableChangeListener
,Actor
,ErrorHandler
,InputConsumer
,OutputProducer
,SpreadSheetMergeActor
,Serializable
,Comparable
public class SpreadSheetMerge extends AbstractTransformer implements SpreadSheetMergeActor, ClassCrossReference
Merges two or more spreadsheets. The merge can be done by using a common key-column or by simply putting the spreadsheets side-by-side.
Input/output:
- accepts:
adams.data.spreadsheet.SpreadSheet[]
- generates:
adams.data.spreadsheet.SpreadSheet
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-name <java.lang.String> (property: name) The name of the actor. default: SpreadSheetMerge
-annotation <adams.core.base.BaseAnnotation> (property: annotations) The annotations to attach to this actor. default:
-skip <boolean> (property: skip) If set to true, transformation is skipped and the input token is just forwarded as it is. default: false
-stop-flow-on-error <boolean> (property: stopFlowOnError) If set to true, the flow execution at this level gets stopped in case this actor encounters an error; the error gets propagated; useful for critical actors. default: false
-silent <boolean> (property: silent) If enabled, then no errors are output in the console; Note: the enclosing actor handler must have this enabled as well. default: false
-use-prefix <boolean> (property: usePrefix) Whether to prefix the attribute names of each dataset with an index and an optional string. default: false
-add-index <boolean> (property: addIndex) Whether to add the index of the dataset to the prefix. default: false
-remove <boolean> (property: remove) If true, only keep instances where data is available from each source. default: false
-prefix <java.lang.String> (property: prefix) The optional prefix string to prefix the index number with (in case prefixes are used). default: dataset
-prefix-separator <java.lang.String> (property: prefixSeparator) The separator string between the generated prefix and the original attribute name. default: -
-exclude-atts <java.lang.String> (property: excludedAttributes) The regular expression used on the attribute names, to determine whether an attribute should be excluded or not (matching sense can be inverted); leave empty to include all attributes. default:
-invert <boolean> (property: invertMatchingSense) Whether to invert the matching sense of excluding attributes, ie, the regular expression is used for including attributes. default: false
-unique-id <java.lang.String> (property: uniqueID) The name of the column used for uniquely identifying rows among the spreadsheets. default:
-keep-only-single-unique-id <boolean> (property: keepOnlySingleUniqueID) If enabled, only a single instance of the unique ID attribute is kept. default: false
-strict <boolean> (property: strict) If enabled, ensures that IDs in unique ID column are truly unique. default: false
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
m_AddIndex
whether to add the index to the prefix.protected String
m_ExcludedAttributes
regular expression for excluding attributes from the datasets.protected boolean
m_InvertMatchingSense
whether to invert the matching sense for excluding attributes.protected boolean
m_KeepOnlySingleUniqueID
whether to keep only a single instance of the unique ID attribute.protected String
m_Prefix
the additional prefix name to use, apart from the index.protected String
m_PrefixSeparator
the separator between index and actual attribute name.protected boolean
m_Remove
whether to remove when not all present.protected boolean
m_Strict
whether to fail if IDs not unique.protected String
m_UniqueID
the string or numeric attribute to use as unique identifier for rows.protected List<String>
m_UniqueIDAtts
the unique ID attributes.protected boolean
m_UsePrefix
whether to prefix the attribute names of each dataset with an index.-
Fields inherited from class adams.flow.transformer.AbstractTransformer
BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
-
Fields inherited from class adams.flow.core.AbstractActor
m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_Executing, m_ExecutionListeningSupporter, m_FullName, m_LoggingPrefix, m_Name, m_Parent, m_ScopeHandler, m_Self, m_Silent, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
Fields inherited from interface adams.flow.core.Actor
FILE_EXTENSION, FILE_EXTENSION_GZ
-
-
Constructor Summary
Constructors Constructor Description SpreadSheetMerge()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Class[]
accepts()
Returns the class that the consumer accepts.String
addIndexTipText()
Returns the tip text for this property.protected String
createPrefix(int index)
Generates the prefix string.void
defineOptions()
Adds options to the internal list of options.protected String
doExecute()
Executes the flow item.protected SpreadSheet
excludeAttributes(SpreadSheet sheet)
Excludes columns from the data.String
excludedAttributesTipText()
Returns the tip text for this property.Class[]
generates()
Returns the class of objects that it generates.boolean
getAddIndex()
Returns whether to add the dataset index number to the prefix.Class[]
getClassCrossReferences()
Returns the cross-referenced classes.String
getExcludedAttributes()
Returns the prefix separator string.boolean
getInvertMatchingSense()
Returns whether to invert the matching sense.boolean
getKeepOnlySingleUniqueID()
Returns whether to keep only a single instance of the unique ID attribute.String
getPrefix()
Returns the optional prefix string.String
getPrefixSeparator()
Returns the prefix separator string.String
getQuickInfo()
Returns a quick info about the actor, which will be displayed in the GUI.boolean
getRemove()
Returns whether to remove if not all presentboolean
getStrict()
Returns whether to enforce uniqueness in IDs.String
getUniqueID()
Returns the attribute (string/numeric) to use for uniquely identifying rows.boolean
getUsePrefix()
Returns whether to use prefixes.String
globalInfo()
Returns a string describing the object.String
invertMatchingSenseTipText()
Returns the tip text for this property.String
keepOnlySingleUniqueIDTipText()
Returns the tip text for this property.protected SpreadSheet
merge(SpreadSheet[] orig, SpreadSheet[] sheets, HashSet ids)
Merges the datasets based on the collected IDs.protected SpreadSheet
prefixColumns(SpreadSheet inst, int index)
Prefixes the columns.String
prefixSeparatorTipText()
Returns the tip text for this property.String
prefixTipText()
Returns the tip text for this property.protected SpreadSheet
prepareData(SpreadSheet inst, int index)
Prepares the data, prefixing columns, removing columns, etc, before merging it.String
removeTipText()
Returns the tip text for this property.void
setAddIndex(boolean value)
Sets whether to add the dataset index number to the prefix.void
setExcludedAttributes(String value)
Sets the regular expression for excluding attributes.void
setInvertMatchingSense(boolean value)
Sets whether to invert the matching sense.void
setKeepOnlySingleUniqueID(boolean value)
Sets whether to keep only a single instance of the unique ID attribute.void
setPrefix(String value)
Sets the optional prefix string.void
setPrefixSeparator(String value)
Sets the prefix separator string.void
setRemove(boolean value)
Sets whether to remove if not all presentvoid
setStrict(boolean value)
Sets whether to enforce uniqueness in IDs.void
setUniqueID(String value)
Sets the attribute (string/numeric) to use for uniquely identifying rows.void
setUsePrefix(boolean value)
Sets whether to use prefixes.String
strictTipText()
Returns the tip text for this property.String
uniqueIDTipText()
Returns the tip text for this property.protected void
updateIDs(int sheetIndex, SpreadSheet inst, HashSet ids)
Updates the IDs in the hashset with the ones stored in the ID column of the provided spreadsheet.String
usePrefixTipText()
Returns the tip text for this property.-
Methods inherited from class adams.flow.transformer.AbstractTransformer
backupState, currentInput, execute, hasInput, hasPendingOutput, input, output, postExecute, restoreState, wrapUp
-
Methods inherited from class adams.flow.core.AbstractActor
annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, configureLogger, destroy, equals, finalUpdateVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, handleException, hasErrorHandler, hasStopMessage, index, initialize, isBackedUp, isExecuted, isExecuting, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, performVariableChecks, preExecute, pruneBackup, pruneBackup, reset, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, silentTipText, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, updateVariables, variableChanged
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.flow.core.Actor
cleanUp, compareTo, destroy, equals, execute, findVariables, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isExecuted, isFinished, isHeadless, isStopped, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, sizeOf, stopExecution, stopExecution, toCommandLine, variableChanged, wrapUp
-
Methods inherited from interface adams.core.AdditionalInformationHandler
getAdditionalInformation
-
Methods inherited from interface adams.flow.core.InputConsumer
currentInput, hasInput, input
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel, setLoggingLevel
-
Methods inherited from interface adams.core.logging.LoggingSupporter
getLogger, isLoggingEnabled
-
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager
-
Methods inherited from interface adams.flow.core.OutputProducer
hasPendingOutput, output
-
Methods inherited from interface adams.core.VariablesInspectionHandler
canInspectOptions
-
-
-
-
Field Detail
-
m_UsePrefix
protected boolean m_UsePrefix
whether to prefix the attribute names of each dataset with an index.
-
m_AddIndex
protected boolean m_AddIndex
whether to add the index to the prefix.
-
m_Remove
protected boolean m_Remove
whether to remove when not all present.
-
m_Prefix
protected String m_Prefix
the additional prefix name to use, apart from the index.
-
m_PrefixSeparator
protected String m_PrefixSeparator
the separator between index and actual attribute name.
-
m_ExcludedAttributes
protected String m_ExcludedAttributes
regular expression for excluding attributes from the datasets.
-
m_InvertMatchingSense
protected boolean m_InvertMatchingSense
whether to invert the matching sense for excluding attributes.
-
m_UniqueID
protected String m_UniqueID
the string or numeric attribute to use as unique identifier for rows.
-
m_KeepOnlySingleUniqueID
protected boolean m_KeepOnlySingleUniqueID
whether to keep only a single instance of the unique ID attribute.
-
m_Strict
protected boolean m_Strict
whether to fail if IDs not unique.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceGlobalInfoSupporter
- Specified by:
globalInfo
in classAbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
getClassCrossReferences
public Class[] getClassCrossReferences()
Returns the cross-referenced classes.- Specified by:
getClassCrossReferences
in interfaceClassCrossReference
- Returns:
- the classes
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceOptionHandler
- Overrides:
defineOptions
in classAbstractActor
-
setRemove
public void setRemove(boolean value)
Sets whether to remove if not all present- Parameters:
value
- if true then remove instance if not all there to merge
-
getRemove
public boolean getRemove()
Returns whether to remove if not all present- Returns:
- if true then remove instance if not all there to merge
-
removeTipText
public String removeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setUsePrefix
public void setUsePrefix(boolean value)
Sets whether to use prefixes.- Parameters:
value
- if true then the attributes will get prefixed
-
getUsePrefix
public boolean getUsePrefix()
Returns whether to use prefixes.- Returns:
- true if the attributes will get prefixed
-
usePrefixTipText
public String usePrefixTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setAddIndex
public void setAddIndex(boolean value)
Sets whether to add the dataset index number to the prefix.- Parameters:
value
- if true then the index will be used in the prefix
-
getAddIndex
public boolean getAddIndex()
Returns whether to add the dataset index number to the prefix.- Returns:
- true if the index will be used in the prefix
-
addIndexTipText
public String addIndexTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setPrefix
public void setPrefix(String value)
Sets the optional prefix string.- Parameters:
value
- the optional prefix string
-
getPrefix
public String getPrefix()
Returns the optional prefix string.- Returns:
- the optional prefix string
-
prefixTipText
public String prefixTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setPrefixSeparator
public void setPrefixSeparator(String value)
Sets the prefix separator string.- Parameters:
value
- the prefix separator string
-
getPrefixSeparator
public String getPrefixSeparator()
Returns the prefix separator string.- Returns:
- the prefix separator string
-
prefixSeparatorTipText
public String prefixSeparatorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setExcludedAttributes
public void setExcludedAttributes(String value)
Sets the regular expression for excluding attributes.- Parameters:
value
- the regular expression
-
getExcludedAttributes
public String getExcludedAttributes()
Returns the prefix separator string.- Returns:
- the prefix separator string
-
excludedAttributesTipText
public String excludedAttributesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setInvertMatchingSense
public void setInvertMatchingSense(boolean value)
Sets whether to invert the matching sense.- Parameters:
value
- if true then matching sense gets inverted
-
getInvertMatchingSense
public boolean getInvertMatchingSense()
Returns whether to invert the matching sense.- Returns:
- true if the attributes will get prefixed
-
invertMatchingSenseTipText
public String invertMatchingSenseTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setUniqueID
public void setUniqueID(String value)
Sets the attribute (string/numeric) to use for uniquely identifying rows.- Parameters:
value
- the attribute name
-
getUniqueID
public String getUniqueID()
Returns the attribute (string/numeric) to use for uniquely identifying rows.- Returns:
- the attribute name
-
uniqueIDTipText
public String uniqueIDTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setKeepOnlySingleUniqueID
public void setKeepOnlySingleUniqueID(boolean value)
Sets whether to keep only a single instance of the unique ID attribute.- Parameters:
value
- true if to keep only single instance
-
getKeepOnlySingleUniqueID
public boolean getKeepOnlySingleUniqueID()
Returns whether to keep only a single instance of the unique ID attribute.- Returns:
- true if to keep only single instance
-
keepOnlySingleUniqueIDTipText
public String keepOnlySingleUniqueIDTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setStrict
public void setStrict(boolean value)
Sets whether to enforce uniqueness in IDs.- Parameters:
value
- true if to enforce
-
getStrict
public boolean getStrict()
Returns whether to enforce uniqueness in IDs.- Returns:
- true if to enforce
-
strictTipText
public String strictTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getQuickInfo
public String getQuickInfo()
Returns a quick info about the actor, which will be displayed in the GUI.- Specified by:
getQuickInfo
in interfaceActor
- Specified by:
getQuickInfo
in interfaceQuickInfoSupporter
- Overrides:
getQuickInfo
in classAbstractActor
- Returns:
- null if no info available, otherwise short string
-
accepts
public Class[] accepts()
Returns the class that the consumer accepts.- Specified by:
accepts
in interfaceInputConsumer
- Returns:
- the Class of objects that can be processed
-
generates
public Class[] generates()
Returns the class of objects that it generates.- Specified by:
generates
in interfaceOutputProducer
- Returns:
- the Class of the generated tokens
-
excludeAttributes
protected SpreadSheet excludeAttributes(SpreadSheet sheet)
Excludes columns from the data.- Parameters:
sheet
- the data to process- Returns:
- the processed data
-
createPrefix
protected String createPrefix(int index)
Generates the prefix string.- Parameters:
index
- the index of the spreadsheet to produce the prefix for- Returns:
- the generated prefix
-
prefixColumns
protected SpreadSheet prefixColumns(SpreadSheet inst, int index)
Prefixes the columns.- Parameters:
index
- the index of the spreadsheetinst
- the data to process- Returns:
- the processed data
-
prepareData
protected SpreadSheet prepareData(SpreadSheet inst, int index)
Prepares the data, prefixing columns, removing columns, etc, before merging it.- Parameters:
inst
- the data to processindex
- the 0-based index of the dataset being processed- Returns:
- the prepared data
-
updateIDs
protected void updateIDs(int sheetIndex, SpreadSheet inst, HashSet ids)
Updates the IDs in the hashset with the ones stored in the ID column of the provided spreadsheet.- Parameters:
sheetIndex
- the spreadheet indexinst
- the spreadsheet to obtain the IDs fromids
- the hashset to store the IDs in
-
merge
protected SpreadSheet merge(SpreadSheet[] orig, SpreadSheet[] sheets, HashSet ids)
Merges the datasets based on the collected IDs.- Parameters:
orig
- the original datasetssheets
- the processed datasets to merge into oneids
- the IDs for identifying the rows- Returns:
- the merged dataset
-
doExecute
protected String doExecute()
Executes the flow item.- Specified by:
doExecute
in classAbstractActor
- Returns:
- null if everything is fine, otherwise error message
-
-