Class JoinOnID
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.flow.transformer.wekadatasetsmerge.AbstractMerge
-
- adams.flow.transformer.wekadatasetsmerge.JoinOnID
-
- All Implemented Interfaces:
Destroyable
,GlobalInfoSupporter
,LoggingLevelHandler
,LoggingSupporter
,OptionHandler
,QuickInfoSupporter
,SizeOfHandler
,Serializable
public class JoinOnID extends AbstractMerge
Joins the datasets by concatenating rows that share a unique ID.
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-class-finder <adams.data.weka.columnfinder.ColumnFinder> (property: classFinder) The column finder to use to find class attributes in the datasets. default: adams.data.weka.columnfinder.Class
-dataset-names <adams.core.base.BaseString> [-dataset-names ...] (property: datasetNames) The list of dataset names to use in attribute renaming. default:
-attr-renames-exp <adams.core.base.BaseRegExp> [-attr-renames-exp ...] (property: attributeRenamesExp) The expressions to use to select attribute names for renaming (one per dataset ). default: more: https://docs.oracle.com/javase/tutorial/essential/regex/ https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
-attr-renames-format <adams.core.base.BaseString> [-attr-renames-format ...] (property: attributeRenamesFormat) One format string for each renaming expression to specify how to rename the attribute. Can contain the {DATASET} keyword which will be replaced by the dataset name, and also group identifiers which will be replaced by groups from the renaming regex. default:
-output-name <java.lang.String> (property: outputName) The name to use for the merged dataset. default: output
-ensure-equal-values <boolean> (property: ensureEqualValues) Whether multiple attributes being merged into a single attribute require equal values from all sources. default: false
-unique-id <java.lang.String> (property: uniqueID) The name of the attribute to use as the joining key for the merge. default:
-complete-rows-only <boolean> (property: completeRowsOnly) Whether only those IDs that have source data in all datasets should be merged. default: false
Performs a merge by using a unique ID attribute for each source dataset to concatenate rows with the same ID.
- Author:
- Corey Sterling (csterlin at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
JoinOnID.UniqueIDEnumeration
Enumeration class that returns the rows from the source datasets joined on the unique ID attribute.-
Nested classes/interfaces inherited from class adams.flow.transformer.wekadatasetsmerge.AbstractMerge
AbstractMerge.SourceAttribute
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
m_CompleteRowsOnly
Whether or not to skip IDs that don't exist in all source datasets.protected String
m_UniqueID
The name of the attribute to use as the merge key.-
Fields inherited from class adams.flow.transformer.wekadatasetsmerge.AbstractMerge
DATASET_KEYWORD, m_AttributeRenameFindRegexs, m_AttributeRenameFormatStrings, m_ClassAttributes, m_ClassFinder, m_DatasetNames, m_Datasets, m_EnsureEqualValues, m_MergedDatasetName, ROW_MISSING
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description JoinOnID()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String
check(weka.core.Instances[] datasets)
Hook method for performing checks before attempting the merge.protected String
checkAllDatasetsHaveIDAttribute(weka.core.Instances[] datasets)
Checks that each of the given datasets has the unique ID attribute.protected String
checkAttributeMapping(Map<String,List<AbstractMerge.SourceAttribute>> attributeMapping)
Makes sure the source data for each mapped attribute is the same type.protected int
compare(List<AbstractMerge.SourceAttribute> sources1, List<AbstractMerge.SourceAttribute> sources2)
Compares two lists of source attributes to determine the order in which their mapped attributes should appear in the merged dataset.String
completeRowsOnlyTipText()
Gets the tip-text for the complete-rows-only option.void
defineOptions()
Adds options to the internal list of options.protected int
findAttributeIndexOfUniqueID(weka.core.Instances dataset)
Finds the index of the unique ID attribute in the given dataset.boolean
getCompleteRowsOnly()
Gets whether incomplete rows should be skipped.protected String
getMappedAttributeName(AbstractMerge.SourceAttribute source)
Gets the name of the attribute in the merged dataset that the given source attribute maps to.protected Enumeration<int[]>
getRowSetEnumeration()
Allows specific merge methods to specify the order in which rows are placed into the merged dataset, and which rows from the source datasets are used for the source data.String
getUniqueID()
Gets the name of the unique ID attribute that the merge is joining on.String
globalInfo()
Returns a string describing the object.protected boolean
isUniqueIDName(String attributeName)
Whether the given attribute name is the name of the unique ID attribute.void
setCompleteRowsOnly(boolean value)
Sets whether incomplete rows should be skipped.void
setUniqueID(String value)
Sets the name of the unique ID attribute that the merge is joining on.String
uniqueIDTipText()
Gets the tip-text for the unique ID option.-
Methods inherited from class adams.flow.transformer.wekadatasetsmerge.AbstractMerge
attributeRenamesExpTipText, attributeRenamesFormatTipText, classFinderTipText, createAttributeMapping, createEmptyResultantDataset, createMappedAttribute, datasetNamesTipText, ensureEqualValuesTipText, getAttributeRenamesExp, getAttributeRenamesFormat, getClassFinder, getDatasetNames, getEnsureEqualValues, getOutputName, getQuickInfo, getValue, getValueEnsureEqual, getValueFirstAvailable, isAnyClassAttribute, isClassAttribute, isClassAttribute, merge, newInstanceForDataset, outputNameTipText, recordClassAttributes, resetInternalState, setAttributeRenamesExp, setAttributeRenamesFormat, setClassFinder, setDatasetNames, setEnsureEqualValues, setOutputName, setValue
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
-
-
-
Field Detail
-
m_UniqueID
protected String m_UniqueID
The name of the attribute to use as the merge key.
-
m_CompleteRowsOnly
protected boolean m_CompleteRowsOnly
Whether or not to skip IDs that don't exist in all source datasets.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceGlobalInfoSupporter
- Specified by:
globalInfo
in classAbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceOptionHandler
- Overrides:
defineOptions
in classAbstractMerge
-
getUniqueID
public String getUniqueID()
Gets the name of the unique ID attribute that the merge is joining on.- Returns:
- The name of the unique ID attribute.
-
setUniqueID
public void setUniqueID(String value)
Sets the name of the unique ID attribute that the merge is joining on.- Parameters:
value
- The name of the unique ID attribute.
-
uniqueIDTipText
public String uniqueIDTipText()
Gets the tip-text for the unique ID option.- Returns:
- The tip-text as a String.
-
getCompleteRowsOnly
public boolean getCompleteRowsOnly()
Gets whether incomplete rows should be skipped.- Returns:
- Whether incomplete rows should be skipped.
-
setCompleteRowsOnly
public void setCompleteRowsOnly(boolean value)
Sets whether incomplete rows should be skipped.- Parameters:
value
- Whether incomplete rows should be skipped.
-
completeRowsOnlyTipText
public String completeRowsOnlyTipText()
Gets the tip-text for the complete-rows-only option.- Returns:
- The tip-text as a String.
-
checkAllDatasetsHaveIDAttribute
protected String checkAllDatasetsHaveIDAttribute(weka.core.Instances[] datasets)
Checks that each of the given datasets has the unique ID attribute. Also checks that the unique ID attribute is the same data type for all datasets.- Parameters:
datasets
- The datasets that are to be merged.- Returns:
- Null if all datasets have the unique ID attribute, otherwise an error message.
-
isUniqueIDName
protected boolean isUniqueIDName(String attributeName)
Whether the given attribute name is the name of the unique ID attribute.- Parameters:
attributeName
- The attribute name to check.- Returns:
- True if the given attribute name is the unique ID name, false otherwise.
-
findAttributeIndexOfUniqueID
protected int findAttributeIndexOfUniqueID(weka.core.Instances dataset)
Finds the index of the unique ID attribute in the given dataset.- Parameters:
dataset
- The dataset to search.- Returns:
- The index of the unique ID attribute, or -1 if not found.
-
compare
protected int compare(List<AbstractMerge.SourceAttribute> sources1, List<AbstractMerge.SourceAttribute> sources2)
Compares two lists of source attributes to determine the order in which their mapped attributes should appear in the merged dataset.- Overrides:
compare
in classAbstractMerge
- Parameters:
sources1
- The source attributes of the first mapped attribute.sources2
- The source attributes of the second mapped attribute.- Returns:
- sources1 < sources2 => -1, sources1 > sources2 => 1, otherwise 0;
-
getMappedAttributeName
protected String getMappedAttributeName(AbstractMerge.SourceAttribute source)
Gets the name of the attribute in the merged dataset that the given source attribute maps to.- Overrides:
getMappedAttributeName
in classAbstractMerge
- Parameters:
source
- The source attribute.- Returns:
- The name of the mapped attribute in the merged dataset.
-
getRowSetEnumeration
protected Enumeration<int[]> getRowSetEnumeration()
Allows specific merge methods to specify the order in which rows are placed into the merged dataset, and which rows from the source datasets are used for the source data.- Specified by:
getRowSetEnumeration
in classAbstractMerge
- Returns:
- An enumeration of the source rows, one row for each dataset.
-
check
protected String check(weka.core.Instances[] datasets)
Hook method for performing checks before attempting the merge.- Overrides:
check
in classAbstractMerge
- Parameters:
datasets
- the datasets to merge- Returns:
- null if successfully checked, otherwise error message
-
checkAttributeMapping
protected String checkAttributeMapping(Map<String,List<AbstractMerge.SourceAttribute>> attributeMapping)
Makes sure the source data for each mapped attribute is the same type.- Overrides:
checkAttributeMapping
in classAbstractMerge
- Parameters:
attributeMapping
- The attribute mapping.- Returns:
- Null if all mappings are okay, or an error message if not.
-
-