adams.flow.transformer
Class SpreadSheetMerge

java.lang.Object
  extended by adams.core.ConsoleObject
      extended by adams.core.option.AbstractOptionHandler
          extended by adams.flow.core.AbstractActor
              extended by adams.flow.transformer.AbstractTransformer
                  extended by adams.flow.transformer.SpreadSheetMerge
All Implemented Interfaces:
AdditionalInformationHandler, CleanUpHandler, Debuggable, DebugOutputHandler, Destroyable, OptionHandler, QuickInfoSupporter, ShallowCopySupporter<AbstractActor>, SizeOfHandler, Stoppable, VariableChangeListener, Actor, ErrorHandler, InputConsumer, OutputProducer, Serializable, Comparable

public class SpreadSheetMerge
extends AbstractTransformer

Merges two or more spreadsheets. The merge can be done by using a common key-column or by simply putting the spreadsheets side-by-side.

Input/output:
- accepts:
   adams.data.spreadsheet.SpreadSheet[]
- generates:
   adams.data.spreadsheet.SpreadSheet[]

Valid options are:

-D <int> (property: debugLevel)
    The greater the number the more additional info the scheme may output to 
    the console (0 = off).
    default: 0
    minimum: 0
 
-name <java.lang.String> (property: name)
    The name of the actor.
    default: SpreadSheetMerge
 
-annotation <adams.core.base.BaseText> (property: annotations)
    The annotations to attach to this actor.
    default: 
 
-skip (property: skip)
    If set to true, transformation is skipped and the input token is just forwarded 
    as it is.
 
-stop-flow-on-error (property: stopFlowOnError)
    If set to true, the flow gets stopped in case this actor encounters an error;
     useful for critical actors.
 
-use-prefix (property: usePrefix)
    Whether to prefix the attribute names of each dataset with an index and 
    an optional string.
 
-add-index (property: addIndex)
    Whether to add the index of the dataset to the prefix.
 
-remove (property: remove)
    If true, only keep instances where data is available from each source.
 
-prefix <java.lang.String> (property: prefix)
    The optional prefix string to prefix the index number with (in case prefixes 
    are used).
    default: dataset
 
-prefix-separator <java.lang.String> (property: prefixSeparator)
    The separator string between the generated prefix and the original attribute 
    name.
    default: -
 
-exclude-atts <java.lang.String> (property: excludedAttributes)
    The regular expression used on the attribute names, to determine whether 
    an attribute should be excluded or not (matching sense can be inverted); 
    leave empty to include all attributes.
    default: 
 
-invert (property: invertMatchingSense)
    Whether to invert the matching sense of excluding attributes, ie, the regular 
    expression is used for including attributes.
 
-unique-id <java.lang.String> (property: uniqueID)
    The name of the column used for uniquely identifying rows among the spreadsheets.
    default: 
 

Version:
$Revision: 7009 $
Author:
fracpete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Field Summary
protected  boolean m_AddIndex
          whether to add the index to the prefix.
protected  String m_ExcludedAttributes
          regular expression for excluding attributes from the datasets.
protected  boolean m_InvertMatchingSense
          whether to invert the matching sense for excluding attributes.
protected  String m_Prefix
          the additional prefix name to use, apart from the index.
protected  String m_PrefixSeparator
          the separator between index and actual attribute name.
protected  boolean m_Remove
          whether to remove when not all present.
protected  String m_UniqueID
          the string or numeric attribute to use as unique identifier for rows.
protected  boolean m_UsePrefix
          whether to prefix the attribute names of each dataset with an index.
 
Fields inherited from class adams.flow.transformer.AbstractTransformer
BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
 
Fields inherited from class adams.flow.core.AbstractActor
m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_Executing, m_ExecutionListeningSupporter, m_FullName, m_Headless, m_Name, m_Parent, m_Root, m_ScopeHandler, m_Self, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
 
Fields inherited from class adams.core.option.AbstractOptionHandler
m_DebugLevel, m_OptionManager
 
Fields inherited from interface adams.flow.core.Actor
FILE_EXTENSION, FILE_EXTENSION_GZ
 
Constructor Summary
SpreadSheetMerge()
           
 
Method Summary
 Class[] accepts()
          Returns the class that the consumer accepts.
 String addIndexTipText()
          Returns the tip text for this property.
protected  String createPrefix(int index)
          Generates the prefix string.
 void defineOptions()
          Adds options to the internal list of options.
protected  String doExecute()
          Executes the flow item.
protected  SpreadSheet excludeAttributes(SpreadSheet sheet)
          Excludes columns from the data.
 String excludedAttributesTipText()
          Returns the tip text for this property.
 Class[] generates()
          Returns the class of objects that it generates.
 boolean getAddIndex()
          Returns whether to add the dataset index number to the prefix.
 String getExcludedAttributes()
          Returns the prefix separator string.
 boolean getInvertMatchingSense()
          Returns whether to invert the matching sense.
 String getPrefix()
          Returns the optional prefix string.
 String getPrefixSeparator()
          Returns the prefix separator string.
 String getQuickInfo()
          Returns a quick info about the actor, which will be displayed in the GUI.
 boolean getRemove()
          Returns whether to remove if not all present
 String getUniqueID()
          Returns the attribute (string/numeric) to use for uniquely identifying rows.
 boolean getUsePrefix()
          Returns whether to use prefixes.
 String globalInfo()
          Returns a string describing the object.
 String invertMatchingSenseTipText()
          Returns the tip text for this property.
protected  SpreadSheet merge(SpreadSheet[] orig, SpreadSheet[] inst, HashSet ids)
          Merges the datasets based on the collected IDs.
protected  SpreadSheet prefixColumns(SpreadSheet inst, int index)
          Prefixes the columns.
 String prefixSeparatorTipText()
          Returns the tip text for this property.
 String prefixTipText()
          Returns the tip text for this property.
protected  SpreadSheet prepareData(SpreadSheet inst, int index)
          Prepares the data, prefixing columns, removing columns, etc, before merging it.
 String removeTipText()
          Returns the tip text for this property.
 void setAddIndex(boolean value)
          Sets whether to add the dataset index number to the prefix.
 void setExcludedAttributes(String value)
          Sets the regular expression for excluding attributes.
 void setInvertMatchingSense(boolean value)
          Sets whether to invert the matching sense.
 void setPrefix(String value)
          Sets the optional prefix string.
 void setPrefixSeparator(String value)
          Sets the prefix separator string.
 void setRemove(boolean value)
          Sets whether to remove if not all present
 void setUniqueID(String value)
          Sets the attribute (string/numeric) to use for uniquely identifying rows.
 void setUsePrefix(boolean value)
          Sets whether to use prefixes.
 String uniqueIDTipText()
          Returns the tip text for this property.
protected  void updateIDs(SpreadSheet inst, HashSet ids)
          Updates the IDs in the hashset with the ones stored in the ID column of the provided spreadsheet.
 String usePrefixTipText()
          Returns the tip text for this property.
 
Methods inherited from class adams.flow.transformer.AbstractTransformer
backupState, execute, hasPendingOutput, input, output, postExecute, reset, restoreState, wrapUp
 
Methods inherited from class adams.flow.core.AbstractActor
annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, debug, destroy, equals, findVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, handleException, hasErrorHandler, hasStopMessage, index, initialize, isBackedUp, isExecuted, isExecuting, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, preExecute, pruneBackup, pruneBackup, setAnnotations, setErrorHandler, setHeadless, setName, setParent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, updateVariables, variableChanged
 
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, debug, debugLevelTipText, finishInit, getDebugLevel, getOptionManager, isDebugOn, newOptionManager, setDebugLevel, toCommandLine, toString
 
Methods inherited from class adams.core.ConsoleObject
getDebugging, getSystemErr, getSystemOut
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface adams.flow.core.Actor
cleanUp, compareTo, debug, destroy, equals, findVariables, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isExecuted, isFinished, isHeadless, isStopped, setAnnotations, setErrorHandler, setHeadless, setName, setParent, setSkip, setStopFlowOnError, setUp, setVariables, sizeOf, stopExecution, stopExecution, variableChanged
 
Methods inherited from interface adams.core.AdditionalInformationHandler
getAdditionalInformation
 
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager
 

Field Detail

m_UsePrefix

protected boolean m_UsePrefix
whether to prefix the attribute names of each dataset with an index.


m_AddIndex

protected boolean m_AddIndex
whether to add the index to the prefix.


m_Remove

protected boolean m_Remove
whether to remove when not all present.


m_Prefix

protected String m_Prefix
the additional prefix name to use, apart from the index.


m_PrefixSeparator

protected String m_PrefixSeparator
the separator between index and actual attribute name.


m_ExcludedAttributes

protected String m_ExcludedAttributes
regular expression for excluding attributes from the datasets.


m_InvertMatchingSense

protected boolean m_InvertMatchingSense
whether to invert the matching sense for excluding attributes.


m_UniqueID

protected String m_UniqueID
the string or numeric attribute to use as unique identifier for rows.

Constructor Detail

SpreadSheetMerge

public SpreadSheetMerge()
Method Detail

globalInfo

public String globalInfo()
Returns a string describing the object.

Specified by:
globalInfo in class AbstractOptionHandler
Returns:
a description suitable for displaying in the gui

defineOptions

public void defineOptions()
Adds options to the internal list of options.

Specified by:
defineOptions in interface OptionHandler
Overrides:
defineOptions in class AbstractActor

setRemove

public void setRemove(boolean value)
Sets whether to remove if not all present

Parameters:
value - if true then remove instance if not all there to merge

getRemove

public boolean getRemove()
Returns whether to remove if not all present

Returns:
if true then remove instance if not all there to merge

removeTipText

public String removeTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setUsePrefix

public void setUsePrefix(boolean value)
Sets whether to use prefixes.

Parameters:
value - if true then the attributes will get prefixed

getUsePrefix

public boolean getUsePrefix()
Returns whether to use prefixes.

Returns:
true if the attributes will get prefixed

usePrefixTipText

public String usePrefixTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setAddIndex

public void setAddIndex(boolean value)
Sets whether to add the dataset index number to the prefix.

Parameters:
value - if true then the index will be used in the prefix

getAddIndex

public boolean getAddIndex()
Returns whether to add the dataset index number to the prefix.

Returns:
true if the index will be used in the prefix

addIndexTipText

public String addIndexTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setPrefix

public void setPrefix(String value)
Sets the optional prefix string.

Parameters:
value - the optional prefix string

getPrefix

public String getPrefix()
Returns the optional prefix string.

Returns:
the optional prefix string

prefixTipText

public String prefixTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setPrefixSeparator

public void setPrefixSeparator(String value)
Sets the prefix separator string.

Parameters:
value - the prefix separator string

getPrefixSeparator

public String getPrefixSeparator()
Returns the prefix separator string.

Returns:
the prefix separator string

prefixSeparatorTipText

public String prefixSeparatorTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setExcludedAttributes

public void setExcludedAttributes(String value)
Sets the regular expression for excluding attributes.

Parameters:
value - the regular expression

getExcludedAttributes

public String getExcludedAttributes()
Returns the prefix separator string.

Returns:
the prefix separator string

excludedAttributesTipText

public String excludedAttributesTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setInvertMatchingSense

public void setInvertMatchingSense(boolean value)
Sets whether to invert the matching sense.

Parameters:
value - if true then matching sense gets inverted

getInvertMatchingSense

public boolean getInvertMatchingSense()
Returns whether to invert the matching sense.

Returns:
true if the attributes will get prefixed

invertMatchingSenseTipText

public String invertMatchingSenseTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setUniqueID

public void setUniqueID(String value)
Sets the attribute (string/numeric) to use for uniquely identifying rows.

Parameters:
value - the attribute name

getUniqueID

public String getUniqueID()
Returns the attribute (string/numeric) to use for uniquely identifying rows.

Returns:
the attribute name

uniqueIDTipText

public String uniqueIDTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

getQuickInfo

public String getQuickInfo()
Returns a quick info about the actor, which will be displayed in the GUI.

Specified by:
getQuickInfo in interface QuickInfoSupporter
Specified by:
getQuickInfo in interface Actor
Overrides:
getQuickInfo in class AbstractActor
Returns:
null if no info available, otherwise short string

accepts

public Class[] accepts()
Returns the class that the consumer accepts.

Returns:
the Class of objects that can be processed

generates

public Class[] generates()
Returns the class of objects that it generates.

Returns:
the Class of the generated tokens

excludeAttributes

protected SpreadSheet excludeAttributes(SpreadSheet sheet)
Excludes columns from the data.

Parameters:
index - the index of the spreadsheet
sheet - the data to process
Returns:
the processed data

createPrefix

protected String createPrefix(int index)
Generates the prefix string.

Parameters:
index - the index of the spreadsheet to produce the prefix for
Returns:
the generated prefix

prefixColumns

protected SpreadSheet prefixColumns(SpreadSheet inst,
                                    int index)
Prefixes the columns.

Parameters:
index - the index of the spreadsheet
inst - the data to process
Returns:
the processed data

prepareData

protected SpreadSheet prepareData(SpreadSheet inst,
                                  int index)
Prepares the data, prefixing columns, removing columns, etc, before merging it.

Parameters:
inst - the data to process
index - the 0-based index of the dataset being processed
Returns:
the prepared data

updateIDs

protected void updateIDs(SpreadSheet inst,
                         HashSet ids)
Updates the IDs in the hashset with the ones stored in the ID column of the provided spreadsheet.

Parameters:
inst - the spreadsheet to obtain the IDs from
ids - the hashset to store the IDs in

merge

protected SpreadSheet merge(SpreadSheet[] orig,
                            SpreadSheet[] inst,
                            HashSet ids)
Merges the datasets based on the collected IDs.

Parameters:
orig - the original datasets
inst - the processed datasets to merge into one
ids - the IDs for identifying the rows
Returns:
the merged dataset

doExecute

protected String doExecute()
Executes the flow item.

Specified by:
doExecute in class AbstractActor
Returns:
null if everything is fine, otherwise error message


Copyright © 2013 University of Waikato, Hamilton, NZ. All Rights Reserved.