Class MergeManyAttributes
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.SimpleFilter
-
- weka.filters.SimpleBatchFilter
-
- weka.filters.unsupervised.attribute.MergeManyAttributes
-
- All Implemented Interfaces:
Serializable
,weka.core.CapabilitiesHandler
,weka.core.CapabilitiesIgnorer
,weka.core.CommandlineRunnable
,weka.core.OptionHandler
,weka.core.RevisionHandler
,weka.filters.UnsupervisedFilter
public class MergeManyAttributes extends weka.filters.SimpleBatchFilter implements weka.filters.UnsupervisedFilter
Merges two or more attributes, offers various strategies if values differ or not present.
Uses the common subsequence (either from start or end) of the attributes as name of the merged attribute, otherwise the concatenation of them (separated by '-'). If this new name should already be present, then a number is appended to the name to make it unique.
The merged attribute can either be left at the default position (whichever one of the attributes that comes first) or moved to a specific one.
If one of the attributes to be merged is the current class attribute, the newly created merged attribute will become the new class attribute.
Valid options are:
-D Turns on output of debugging information.
-att-name <att name> The name of the attribute, can be supplied multiple times.
-remove-chars <chars> The characters to remove from the start/end of the generated name for the merged attribute. (default: -_.)
-merged-index <position> The new position for the merged attribute. Empty string is default position, i.e., either the position of the first or second attribute (whichever comes first) (default: )
-differ <MISSING|AVERAGE> The strategy to apply in case the values of the attributes differ. (default: MISSING)
-one-missing <MISSING|PRESENT> The strategy to apply in case one of the values is missing. (default: MISSING)
- Version:
- $Revision$
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_REMOVE_CHARS
characters to remove from start/end of the merged name.protected adams.core.base.BaseString[]
m_AttributeNames
the attribute names.protected int
m_Differ
how to handle differing values.protected String
m_Merged
the name of the merged attribute.protected weka.core.SingleIndex
m_MergedIndex
the position for the merged attribute (empty = leave at default position).protected int
m_OneMissing
what to do if one value is missing.protected String
m_RemoveChars
the characters to remove from the merged name (start/end).static int
ONEMISSING_MISSING
what to do if one is missing: missing.static int
ONEMISSING_USE_FIRST_PRESENT
what to do if one is missing: use first present value.static weka.core.Tag[]
TAGS_ONEMISSING
static weka.core.Tag[]
TAGS_VALUESDIFFER
the types of how to handle differing values.static int
VALUESDIFFER_AVERAGE
how to handle differing values: average.static int
VALUESDIFFER_MISSING
how to handle differing values: missing.
-
Constructor Summary
Constructors Constructor Description MergeManyAttributes()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description String
attributeNamesTipText()
Returns the tip text for this property.protected String
commonSubsequence(String s1, String s2, boolean forward)
Determines the common subsequence of the two strings.protected weka.core.Instances
determineOutputFormat(weka.core.Instances inputFormat)
Determines the output format based on the input format and returns this.String
differTipText()
Returns the tip text for this property.adams.core.base.BaseString[]
getAttributeNames()
Gets the names of the attributes.weka.core.Capabilities
getCapabilities()
Returns the Capabilities of this filter.weka.core.SelectedTag
getDiffer()
Gets the type of strategy to apply if the two values differ.String
getMergedIndex()
Gets the position for the merged attribute.weka.core.SelectedTag
getOneMissing()
Gets the type of strategy to apply if one of the values is missing.String[]
getOptions()
returns the options of the current setup.String
getRemoveChars()
Gets the characters to remove from start/end of the generated name.String
getRevision()
Returns the revision string.String
globalInfo()
Returns a string describing this classifier.Enumeration
listOptions()
Gets an enumeration describing the available options.static void
main(String[] args)
runs the filter with the given arguments.String
mergedIndexTipText()
Returns the tip text for this property.String
oneMissingTipText()
Returns the tip text for this property.protected weka.core.Instances
process(weka.core.Instances instances)
Processes the given data (may change the provided dataset) and returns the modified version.String
removeCharsTipText()
Returns the tip text for this property.void
setAttributeNames(adams.core.base.BaseString[] value)
Sets the names of the attributes.void
setDiffer(weka.core.SelectedTag value)
Sets the type of strategy to apply if the two values differ.void
setMergedIndex(String value)
Sets the position for the merged attribute.void
setOneMissing(weka.core.SelectedTag value)
Sets the type of strategy to apply if one of the values is missing.void
setOptions(String[] options)
Parses the options for this object.void
setRemoveChars(String value)
Sets the characters to remove from start/end of the generated name.-
Methods inherited from class weka.filters.SimpleBatchFilter
allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input
-
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
-
-
-
-
Field Detail
-
VALUESDIFFER_MISSING
public static final int VALUESDIFFER_MISSING
how to handle differing values: missing.- See Also:
- Constant Field Values
-
VALUESDIFFER_AVERAGE
public static final int VALUESDIFFER_AVERAGE
how to handle differing values: average.- See Also:
- Constant Field Values
-
TAGS_VALUESDIFFER
public static final weka.core.Tag[] TAGS_VALUESDIFFER
the types of how to handle differing values.
-
ONEMISSING_MISSING
public static final int ONEMISSING_MISSING
what to do if one is missing: missing.- See Also:
- Constant Field Values
-
ONEMISSING_USE_FIRST_PRESENT
public static final int ONEMISSING_USE_FIRST_PRESENT
what to do if one is missing: use first present value.- See Also:
- Constant Field Values
-
TAGS_ONEMISSING
public static final weka.core.Tag[] TAGS_ONEMISSING
-
DEFAULT_REMOVE_CHARS
public static final String DEFAULT_REMOVE_CHARS
characters to remove from start/end of the merged name.- See Also:
- Constant Field Values
-
m_AttributeNames
protected adams.core.base.BaseString[] m_AttributeNames
the attribute names.
-
m_Differ
protected int m_Differ
how to handle differing values.
-
m_OneMissing
protected int m_OneMissing
what to do if one value is missing.
-
m_RemoveChars
protected String m_RemoveChars
the characters to remove from the merged name (start/end).
-
m_MergedIndex
protected weka.core.SingleIndex m_MergedIndex
the position for the merged attribute (empty = leave at default position).
-
m_Merged
protected String m_Merged
the name of the merged attribute.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing this classifier.- Specified by:
globalInfo
in classweka.filters.SimpleFilter
- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Gets an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.filters.Filter
- Returns:
- an enumeration of all the available options.
-
getOptions
public String[] getOptions()
returns the options of the current setup.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.filters.Filter
- Returns:
- the current options
-
setOptions
public void setOptions(String[] options) throws Exception
Parses the options for this object.
Valid options are:
-D Turns on output of debugging information.
-att-name <att name> The name of the attribute, can be supplied multiple times.
-remove-chars <chars> The characters to remove from the start/end of the generated name for the merged attribute. (default: -_.)
-merged-index <position> The new position for the merged attribute. Empty string is default position, i.e., either the position of the first or second attribute (whichever comes first) (default: )
-differ <MISSING|AVERAGE> The strategy to apply in case the values of the attributes differ. (default: MISSING)
-one-missing <MISSING|PRESENT> The strategy to apply in case one of the values is missing. (default: MISSING)
- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.filters.Filter
- Parameters:
options
- the options to use- Throws:
Exception
- if the option setting fails
-
setAttributeNames
public void setAttributeNames(adams.core.base.BaseString[] value)
Sets the names of the attributes.- Parameters:
value
- the names
-
getAttributeNames
public adams.core.base.BaseString[] getAttributeNames()
Gets the names of the attributes.- Returns:
- the names
-
attributeNamesTipText
public String attributeNamesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMergedIndex
public void setMergedIndex(String value)
Sets the position for the merged attribute.- Parameters:
value
- the position, empty string for default
-
getMergedIndex
public String getMergedIndex()
Gets the position for the merged attribute.- Returns:
- the position, empty string for default
-
mergedIndexTipText
public String mergedIndexTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRemoveChars
public void setRemoveChars(String value)
Sets the characters to remove from start/end of the generated name.- Parameters:
value
- the characters
-
getRemoveChars
public String getRemoveChars()
Gets the characters to remove from start/end of the generated name.- Returns:
- the characters
-
removeCharsTipText
public String removeCharsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDiffer
public void setDiffer(weka.core.SelectedTag value)
Sets the type of strategy to apply if the two values differ.- Parameters:
value
- the strategy
-
getDiffer
public weka.core.SelectedTag getDiffer()
Gets the type of strategy to apply if the two values differ.- Returns:
- the strategy
-
differTipText
public String differTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOneMissing
public void setOneMissing(weka.core.SelectedTag value)
Sets the type of strategy to apply if one of the values is missing.- Parameters:
value
- the strategy
-
getOneMissing
public weka.core.SelectedTag getOneMissing()
Gets the type of strategy to apply if one of the values is missing.- Returns:
- the strategy
-
oneMissingTipText
public String oneMissingTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
commonSubsequence
protected String commonSubsequence(String s1, String s2, boolean forward)
Determines the common subsequence of the two strings.- Parameters:
s1
- the first strings2
- the second stringforward
- if false, the strings are search from back to front
-
determineOutputFormat
protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat) throws Exception
Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().- Specified by:
determineOutputFormat
in classweka.filters.SimpleFilter
- Parameters:
inputFormat
- the input format to base the output format on- Returns:
- the output format
- Throws:
Exception
- in case the determination goes wrong- See Also:
SimpleBatchFilter.hasImmediateOutputFormat()
,SimpleBatchFilter.batchFinished()
-
getCapabilities
public weka.core.Capabilities getCapabilities()
Returns the Capabilities of this filter.- Specified by:
getCapabilities
in interfaceweka.core.CapabilitiesHandler
- Overrides:
getCapabilities
in classweka.filters.Filter
- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
process
protected weka.core.Instances process(weka.core.Instances instances) throws Exception
Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().- Specified by:
process
in classweka.filters.SimpleFilter
- Parameters:
instances
- the data to process- Returns:
- the modified data
- Throws:
Exception
- in case the processing goes wrong- See Also:
SimpleBatchFilter.batchFinished()
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceweka.core.RevisionHandler
- Overrides:
getRevision
in classweka.filters.Filter
- Returns:
- the revision
-
main
public static void main(String[] args)
runs the filter with the given arguments.- Parameters:
args
- the commandline arguments
-
-