Class AbstractMerge
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.flow.transformer.wekadatasetsmerge.AbstractMerge
-
- All Implemented Interfaces:
adams.core.Destroyable
,adams.core.GlobalInfoSupporter
,adams.core.logging.LoggingLevelHandler
,adams.core.logging.LoggingSupporter
,adams.core.option.OptionHandler
,adams.core.QuickInfoSupporter
,adams.core.SizeOfHandler
,Serializable
public abstract class AbstractMerge extends adams.core.option.AbstractOptionHandler implements adams.core.QuickInfoSupporter
Ancestor for merge schemes.
- Author:
- Corey Sterling (csterlin at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
AbstractMerge.SourceAttribute
Helper class for determining the mapping from input attributes in the source datasets to output attributes in the merged dataset.
-
Field Summary
Fields Modifier and Type Field Description protected static String
DATASET_KEYWORD
The keyword to replace with the dataset name in attribute renaming.protected adams.core.base.BaseRegExp[]
m_AttributeRenameFindRegexs
The regexs to use to find attributes that require renaming.protected adams.core.base.BaseString[]
m_AttributeRenameFormatStrings
The format strings specifying how to rename attributes.protected int[][]
m_ClassAttributes
The set of class attributes for the given datasets.protected ColumnFinder
m_ClassFinder
The column finder for selecting class attributes.protected adams.core.base.BaseString[]
m_DatasetNames
The name of each dataset to use in attribute renaming.protected weka.core.Instances[]
m_Datasets
The source datasets we are merging.protected boolean
m_EnsureEqualValues
Whether to check attributes with multiple sources for equal values among those sources.protected String
m_MergedDatasetName
The name to give the resulting dataset.protected static int
ROW_MISSING
The constant value for datasets that do not have an input row for this output row.
-
Constructor Summary
Constructors Constructor Description AbstractMerge()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description String
attributeRenamesExpTipText()
Gets the tip-text for the attribute-renaming regexs option.String
attributeRenamesFormatTipText()
Gets the tip-text for the attribute renaming format strings option.protected String
check(weka.core.Instances[] datasets)
Hook method for performing checks before attempting the merge.protected String
checkAttributeMapping(Map<String,List<AbstractMerge.SourceAttribute>> attributeMapping)
Makes sure the source data for each mapped attribute is the same type.String
classFinderTipText()
Gets the tip-text for the classFinder option.protected int
compare(List<AbstractMerge.SourceAttribute> sources1, List<AbstractMerge.SourceAttribute> sources2)
Compares two lists of source attributes to determine the order in which their mapped attributes should appear in the merged dataset.protected Map<String,List<AbstractMerge.SourceAttribute>>
createAttributeMapping()
Creates a mapping from the attributes in each input dataset to the corresponding attribute in the merged dataset.protected weka.core.Instances
createEmptyResultantDataset(Map<String,List<AbstractMerge.SourceAttribute>> attributeMapping)
Creates the resultant dataset, ready to be filled with data.protected weka.core.Attribute
createMappedAttribute(String name, List<AbstractMerge.SourceAttribute> sources)
Creates the attribute for the output merged dataset for the given attribute mapping.String
datasetNamesTipText()
Gets the tip-text for the dataset names option.void
defineOptions()
Adds options to the internal list of options.String
ensureEqualValuesTipText()
Gets the tip-text for the ensure-equal-values option.adams.core.base.BaseRegExp[]
getAttributeRenamesExp()
Gets the array of attribute rename expressions.adams.core.base.BaseString[]
getAttributeRenamesFormat()
Gets the array of format strings used for attribute renaming.ColumnFinder
getClassFinder()
Gets the finder to use for finding class attributes in the source datasets.adams.core.base.BaseString[]
getDatasetNames()
Gets the list of names to use in attribute renaming in place of the {DATASET} keyword.boolean
getEnsureEqualValues()
Gets whether to check all data-sources for a merged attribute have the same value.protected String
getMappedAttributeName(AbstractMerge.SourceAttribute source)
Gets the name of the attribute in the merged dataset that the given source attribute maps to.String
getOutputName()
Gets the name to use for the merged dataset.String
getQuickInfo()
Returns a quick info about the object, which can be displayed in the GUI.protected abstract Enumeration<int[]>
getRowSetEnumeration()
Allows specific merge methods to specify the order in which rows are placed into the merged dataset, and which rows from the source datasets are used for the source data.protected Object
getValue(weka.core.Instance toGetFrom, int attributeIndex)
Gets the value of the specified attribute from the given Instance.protected Object
getValueEnsureEqual(int[] rowSet, List<AbstractMerge.SourceAttribute> sourceAttributeElements)
Gets the value of the mapped attribute, ensuring that all possible sources either provide a missing value or the same value as each other.protected Object
getValueFirstAvailable(int[] rowSet, List<AbstractMerge.SourceAttribute> sourceAttributes)
Gets the first encountered source value for a merged attribute.protected boolean
isAnyClassAttribute(List<AbstractMerge.SourceAttribute> sources)
Checks if any of the source attributes in the given list is a class attribute.protected boolean
isClassAttribute(int datasetIndex, int attributeIndex)
Whether the given attribute is a class attribute.protected boolean
isClassAttribute(AbstractMerge.SourceAttribute source)
Checks if the given source attribute is a class attribute.weka.core.Instances
merge(weka.core.Instances[] datasets)
Merges the datasets.protected weka.core.Instance
newInstanceForDataset(weka.core.Instances dataset)
Creates a new dense instance of the size expected by the given dataset.String
outputNameTipText()
Gets the tip-text for the output name option.protected void
recordClassAttributes()
Scans the datasets for attributes that should be considered classes, and keeps a record of them.protected void
resetInternalState(weka.core.Instances[] datasets)
Resets the internal state of the merge method when new datasets are supplied.void
setAttributeRenamesExp(adams.core.base.BaseRegExp[] value)
Sets the array of attribute rename expressions.void
setAttributeRenamesFormat(adams.core.base.BaseString[] value)
Sets the array of format strings used for attribute renaming.void
setClassFinder(ColumnFinder value)
Sets the finder to use for finding class attributes in the source datasets.void
setDatasetNames(adams.core.base.BaseString[] value)
Sets the list of names to use in attribute renaming in place of the {DATASET} keyword.void
setEnsureEqualValues(boolean value)
Sets whether to check all data-sources for a merged attribute have the same value.void
setOutputName(String value)
Sets the name to use for the merged dataset.protected void
setValue(weka.core.Instance toSet, int attributeIndex, Object value)
Sets the value of the given attribute in the given instance to the given value (handles object conversion).-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, globalInfo, initialize, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
-
-
-
Field Detail
-
DATASET_KEYWORD
protected static final String DATASET_KEYWORD
The keyword to replace with the dataset name in attribute renaming.- See Also:
- Constant Field Values
-
ROW_MISSING
protected static final int ROW_MISSING
The constant value for datasets that do not have an input row for this output row.- See Also:
- Constant Field Values
-
m_ClassFinder
protected ColumnFinder m_ClassFinder
The column finder for selecting class attributes.
-
m_DatasetNames
protected adams.core.base.BaseString[] m_DatasetNames
The name of each dataset to use in attribute renaming.
-
m_AttributeRenameFindRegexs
protected adams.core.base.BaseRegExp[] m_AttributeRenameFindRegexs
The regexs to use to find attributes that require renaming.
-
m_AttributeRenameFormatStrings
protected adams.core.base.BaseString[] m_AttributeRenameFormatStrings
The format strings specifying how to rename attributes.
-
m_MergedDatasetName
protected String m_MergedDatasetName
The name to give the resulting dataset.
-
m_EnsureEqualValues
protected boolean m_EnsureEqualValues
Whether to check attributes with multiple sources for equal values among those sources.
-
m_Datasets
protected weka.core.Instances[] m_Datasets
The source datasets we are merging.
-
m_ClassAttributes
protected int[][] m_ClassAttributes
The set of class attributes for the given datasets.
-
-
Method Detail
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceadams.core.option.OptionHandler
- Overrides:
defineOptions
in classadams.core.option.AbstractOptionHandler
-
getClassFinder
public ColumnFinder getClassFinder()
Gets the finder to use for finding class attributes in the source datasets.- Returns:
- The class-attribute finder.
-
setClassFinder
public void setClassFinder(ColumnFinder value)
Sets the finder to use for finding class attributes in the source datasets.- Parameters:
value
- The class-attribute finder.
-
classFinderTipText
public String classFinderTipText()
Gets the tip-text for the classFinder option.- Returns:
- The tip-text as a String.
-
getDatasetNames
public adams.core.base.BaseString[] getDatasetNames()
Gets the list of names to use in attribute renaming in place of the {DATASET} keyword.- Returns:
- The list of dataset names.
-
setDatasetNames
public void setDatasetNames(adams.core.base.BaseString[] value)
Sets the list of names to use in attribute renaming in place of the {DATASET} keyword.- Parameters:
value
- The list of dataset names.
-
datasetNamesTipText
public String datasetNamesTipText()
Gets the tip-text for the dataset names option.- Returns:
- The tip-text as a String.
-
getAttributeRenamesExp
public adams.core.base.BaseRegExp[] getAttributeRenamesExp()
Gets the array of attribute rename expressions.- Returns:
- The array of regexs.
-
setAttributeRenamesExp
public void setAttributeRenamesExp(adams.core.base.BaseRegExp[] value)
Sets the array of attribute rename expressions.- Parameters:
value
- The array of regexs.
-
attributeRenamesExpTipText
public String attributeRenamesExpTipText()
Gets the tip-text for the attribute-renaming regexs option.- Returns:
- The tip-text as a String.
-
getAttributeRenamesFormat
public adams.core.base.BaseString[] getAttributeRenamesFormat()
Gets the array of format strings used for attribute renaming.- Returns:
- The array of format strings.
-
setAttributeRenamesFormat
public void setAttributeRenamesFormat(adams.core.base.BaseString[] value)
Sets the array of format strings used for attribute renaming.- Parameters:
value
- The array of format strings.
-
attributeRenamesFormatTipText
public String attributeRenamesFormatTipText()
Gets the tip-text for the attribute renaming format strings option.- Returns:
- The tip-text as a String.
-
getOutputName
public String getOutputName()
Gets the name to use for the merged dataset.- Returns:
- The name to use.
-
setOutputName
public void setOutputName(String value)
Sets the name to use for the merged dataset.- Parameters:
value
- The name to use.
-
outputNameTipText
public String outputNameTipText()
Gets the tip-text for the output name option.- Returns:
- The tip-text as a String.
-
getEnsureEqualValues
public boolean getEnsureEqualValues()
Gets whether to check all data-sources for a merged attribute have the same value.- Returns:
- True if value equality should be checked, false if not.
-
setEnsureEqualValues
public void setEnsureEqualValues(boolean value)
Sets whether to check all data-sources for a merged attribute have the same value.- Parameters:
value
- True if value equality should be checked, false if not.
-
ensureEqualValuesTipText
public String ensureEqualValuesTipText()
Gets the tip-text for the ensure-equal-values option.- Returns:
- The tip-text as a String.
-
getQuickInfo
public String getQuickInfo()
Returns a quick info about the object, which can be displayed in the GUI.
Default implementation returns just null.- Specified by:
getQuickInfo
in interfaceadams.core.QuickInfoSupporter
- Returns:
- null if no info available, otherwise short string
-
setValue
protected void setValue(weka.core.Instance toSet, int attributeIndex, Object value)
Sets the value of the given attribute in the given instance to the given value (handles object conversion).- Parameters:
toSet
- The instance against which the value should be set.attributeIndex
- The index of the attribute against which to set the value.value
- The value to set the attribute to.
-
getValue
protected Object getValue(weka.core.Instance toGetFrom, int attributeIndex)
Gets the value of the specified attribute from the given Instance.- Parameters:
toGetFrom
- The instance to get a value from.attributeIndex
- The index of the value's attribute.- Returns:
- The value of the instance at the given index.
-
newInstanceForDataset
protected weka.core.Instance newInstanceForDataset(weka.core.Instances dataset)
Creates a new dense instance of the size expected by the given dataset.- Parameters:
dataset
- The dataset to create a new instance for.- Returns:
- The created dataset.
-
check
protected String check(weka.core.Instances[] datasets)
Hook method for performing checks before attempting the merge.- Parameters:
datasets
- the datasets to merge- Returns:
- null if successfully checked, otherwise error message
-
checkAttributeMapping
protected String checkAttributeMapping(Map<String,List<AbstractMerge.SourceAttribute>> attributeMapping)
Makes sure the source data for each mapped attribute is the same type.- Parameters:
attributeMapping
- The attribute mapping.- Returns:
- Null if all mappings are okay, or an error message if not.
-
merge
public weka.core.Instances merge(weka.core.Instances[] datasets)
Merges the datasets.- Parameters:
datasets
- the datasets to merge- Returns:
- the merged dataset
-
getValueFirstAvailable
protected Object getValueFirstAvailable(int[] rowSet, List<AbstractMerge.SourceAttribute> sourceAttributes)
Gets the first encountered source value for a merged attribute.- Parameters:
rowSet
- The row-set of source data.sourceAttributes
- The source attribute mapping elements.- Returns:
- The value of the merged attribute.
-
getValueEnsureEqual
protected Object getValueEnsureEqual(int[] rowSet, List<AbstractMerge.SourceAttribute> sourceAttributeElements)
Gets the value of the mapped attribute, ensuring that all possible sources either provide a missing value or the same value as each other.- Parameters:
rowSet
- The row-set of source data.sourceAttributeElements
- The source attribute mapping elements.- Returns:
- The value of the merged attribute.
-
createAttributeMapping
protected Map<String,List<AbstractMerge.SourceAttribute>> createAttributeMapping()
Creates a mapping from the attributes in each input dataset to the corresponding attribute in the merged dataset.- Returns:
- The mapping from input attribute names to output attribute names.
-
isAnyClassAttribute
protected boolean isAnyClassAttribute(List<AbstractMerge.SourceAttribute> sources)
Checks if any of the source attributes in the given list is a class attribute.- Parameters:
sources
- The source attributes to check.- Returns:
- True if a source attribute is a class, false if none are.
-
isClassAttribute
protected boolean isClassAttribute(AbstractMerge.SourceAttribute source)
Checks if the given source attribute is a class attribute.- Parameters:
source
- The source attribute to check.- Returns:
- True if the source is a class attribute, false if not.
-
isClassAttribute
protected boolean isClassAttribute(int datasetIndex, int attributeIndex)
Whether the given attribute is a class attribute.- Parameters:
datasetIndex
- The dataset the attribute is in.attributeIndex
- The index of the attribute in the dataset.- Returns:
- True if the given attribute is a class attribute, false if not.
-
recordClassAttributes
protected void recordClassAttributes()
Scans the datasets for attributes that should be considered classes, and keeps a record of them.
-
createEmptyResultantDataset
protected weka.core.Instances createEmptyResultantDataset(Map<String,List<AbstractMerge.SourceAttribute>> attributeMapping)
Creates the resultant dataset, ready to be filled with data.- Parameters:
attributeMapping
- The mapping from merged attribute names to their original names.- Returns:
- The empty Instances object for the merged dataset.
-
createMappedAttribute
protected weka.core.Attribute createMappedAttribute(String name, List<AbstractMerge.SourceAttribute> sources)
Creates the attribute for the output merged dataset for the given attribute mapping.- Parameters:
name
- The name of the mapped attribute.sources
- The list of mappings that the attribute maps to.- Returns:
- The attribute for the merged dataset.
-
compare
protected int compare(List<AbstractMerge.SourceAttribute> sources1, List<AbstractMerge.SourceAttribute> sources2)
Compares two lists of source attributes to determine the order in which their mapped attributes should appear in the merged dataset.- Parameters:
sources1
- The source attributes of the first mapped attribute.sources2
- The source attributes of the second mapped attribute.- Returns:
- sources1 < sources2 => -1, sources1 > sources2 => 1, otherwise 0;
-
getMappedAttributeName
protected String getMappedAttributeName(AbstractMerge.SourceAttribute source)
Gets the name of the attribute in the merged dataset that the given source attribute maps to.- Parameters:
source
- The source attribute.- Returns:
- The name of the mapped attribute in the merged dataset.
-
resetInternalState
protected void resetInternalState(weka.core.Instances[] datasets)
Resets the internal state of the merge method when new datasets are supplied.- Parameters:
datasets
- The datasets being merged.
-
getRowSetEnumeration
protected abstract Enumeration<int[]> getRowSetEnumeration()
Allows specific merge methods to specify the order in which rows are placed into the merged dataset, and which rows from the source datasets are used for the source data.- Returns:
- An enumeration of the source rows, one row for each dataset.
-
-