Class MergeTwoAttributes
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.SimpleFilter
-
- weka.filters.SimpleBatchFilter
-
- weka.filters.unsupervised.attribute.MergeTwoAttributes
-
- All Implemented Interfaces:
Serializable
,weka.core.CapabilitiesHandler
,weka.core.CapabilitiesIgnorer
,weka.core.CommandlineRunnable
,weka.core.OptionHandler
,weka.core.RevisionHandler
,weka.filters.UnsupervisedFilter
public class MergeTwoAttributes extends weka.filters.SimpleBatchFilter implements weka.filters.UnsupervisedFilter
Merges two attributes, offers various strategies if values differ or not present.
Uses the common subsequence (either from start or end) of the two attributes as name of the merged attribute, otherwise the concatenation of the both (separated by '-'). If this new name should already be present, then a number is appended to the name to make it unique.
The merged attribute can either be left at the default position (either first or second attribute, whichever comes first) or moved to a specific one.
If one of the two attributes to be merged is the current class attribute, the newly created merged attribute will become the new class attribute.
Valid options are:
-D Turns on output of debugging information.
-first <att name> The name of the first attribute. (default: att1)
-second <att name> The name of the second attribute. (default: att2)
-remove-chars <chars> The characters to remove from the start/end of the generated name for the merged attribute. (default: -_.)
-merged-index <position> The new position for the merged attribute. Empty string is default position, i.e., either the position of the first or second attribute (whichever comes first) (default: )
-differ <MISSING|AVERAGE|FIRST|SECOND> The strategy to apply in case the values of the two attributes differ. (default: MISSING)
-one-missing <MISSING|PRESENT> The strategy to apply in case one of the two values is missing. (default: MISSING)
- Version:
- $Revision$
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_NAME_FIRST
the default first attribute name.static String
DEFAULT_NAME_SECOND
the default second attribute name.static String
DEFAULT_REMOVE_CHARS
characters to remove from start/end of the merged name.protected int
m_Differ
how to handle differing values.protected String
m_FirstAttribute
the name of the first attribute.protected String
m_Merged
the name of the merged attribute.protected weka.core.SingleIndex
m_MergedIndex
the position for the merged attribute (empty = leave at default position).protected int
m_OneMissing
what to do if one value is missing.protected String
m_RemoveChars
the characters to remove from the merged name (start/end).protected String
m_SecondAttribute
the name of the second attribute.static int
ONEMISSING_MISSING
what to do if one is missing: missing.static int
ONEMISSING_USE_PRESENT
what to do if one is missing: use present value.static weka.core.Tag[]
TAGS_ONEMISSING
static weka.core.Tag[]
TAGS_VALUESDIFFER
the types of how to handle differing values.static int
VALUESDIFFER_AVERAGE
how to handle differing values: average.static int
VALUESDIFFER_FIRST
how to handle differing values: first.static int
VALUESDIFFER_MISSING
how to handle differing values: missing.static int
VALUESDIFFER_SECOND
how to handle differing values: SECOND.
-
Constructor Summary
Constructors Constructor Description MergeTwoAttributes()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String
commonSubsequence(String s1, String s2, boolean forward)
Determines the common subsequence of the two strings.protected weka.core.Instances
determineOutputFormat(weka.core.Instances inputFormat)
Determines the output format based on the input format and returns this.String
differTipText()
Returns the tip text for this property.String
firstAttributeTipText()
Returns the tip text for this property.weka.core.Capabilities
getCapabilities()
Returns the Capabilities of this filter.weka.core.SelectedTag
getDiffer()
Gets the type of strategy to apply if the two values differ.String
getFirstAttribute()
Gets the name of the first attribute.String
getMergedIndex()
Gets the position for the merged attribute.weka.core.SelectedTag
getOneMissing()
Gets the type of strategy to apply if one of the values is missing.String[]
getOptions()
returns the options of the current setup.String
getRemoveChars()
Gets the characters to remove from start/end of the generated name.String
getRevision()
Returns the revision string.String
getSecondAttribute()
Gets the name of the second attribute.String
globalInfo()
Returns a string describing this classifier.Enumeration
listOptions()
Gets an enumeration describing the available options.static void
main(String[] args)
runs the filter with the given arguments.String
mergedIndexTipText()
Returns the tip text for this property.String
oneMissingTipText()
Returns the tip text for this property.protected weka.core.Instances
process(weka.core.Instances instances)
Processes the given data (may change the provided dataset) and returns the modified version.String
removeCharsTipText()
Returns the tip text for this property.String
secondAttributeTipText()
Returns the tip text for this property.void
setDiffer(weka.core.SelectedTag value)
Sets the type of strategy to apply if the two values differ.void
setFirstAttribute(String value)
Sets the name of the first attribute.void
setMergedIndex(String value)
Sets the position for the merged attribute.void
setOneMissing(weka.core.SelectedTag value)
Sets the type of strategy to apply if one of the values is missing.void
setOptions(String[] options)
Parses the options for this object.void
setRemoveChars(String value)
Sets the characters to remove from start/end of the generated name.void
setSecondAttribute(String value)
Sets the name of the second attribute.-
Methods inherited from class weka.filters.SimpleBatchFilter
allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input, input
-
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
-
-
-
-
Field Detail
-
VALUESDIFFER_MISSING
public static final int VALUESDIFFER_MISSING
how to handle differing values: missing.- See Also:
- Constant Field Values
-
VALUESDIFFER_AVERAGE
public static final int VALUESDIFFER_AVERAGE
how to handle differing values: average.- See Also:
- Constant Field Values
-
VALUESDIFFER_FIRST
public static final int VALUESDIFFER_FIRST
how to handle differing values: first.- See Also:
- Constant Field Values
-
VALUESDIFFER_SECOND
public static final int VALUESDIFFER_SECOND
how to handle differing values: SECOND.- See Also:
- Constant Field Values
-
TAGS_VALUESDIFFER
public static final weka.core.Tag[] TAGS_VALUESDIFFER
the types of how to handle differing values.
-
ONEMISSING_MISSING
public static final int ONEMISSING_MISSING
what to do if one is missing: missing.- See Also:
- Constant Field Values
-
ONEMISSING_USE_PRESENT
public static final int ONEMISSING_USE_PRESENT
what to do if one is missing: use present value.- See Also:
- Constant Field Values
-
TAGS_ONEMISSING
public static final weka.core.Tag[] TAGS_ONEMISSING
-
DEFAULT_REMOVE_CHARS
public static final String DEFAULT_REMOVE_CHARS
characters to remove from start/end of the merged name.- See Also:
- Constant Field Values
-
DEFAULT_NAME_FIRST
public static final String DEFAULT_NAME_FIRST
the default first attribute name.- See Also:
- Constant Field Values
-
DEFAULT_NAME_SECOND
public static final String DEFAULT_NAME_SECOND
the default second attribute name.- See Also:
- Constant Field Values
-
m_FirstAttribute
protected String m_FirstAttribute
the name of the first attribute.
-
m_SecondAttribute
protected String m_SecondAttribute
the name of the second attribute.
-
m_Differ
protected int m_Differ
how to handle differing values.
-
m_OneMissing
protected int m_OneMissing
what to do if one value is missing.
-
m_RemoveChars
protected String m_RemoveChars
the characters to remove from the merged name (start/end).
-
m_MergedIndex
protected weka.core.SingleIndex m_MergedIndex
the position for the merged attribute (empty = leave at default position).
-
m_Merged
protected String m_Merged
the name of the merged attribute.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing this classifier.- Specified by:
globalInfo
in classweka.filters.SimpleFilter
- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Gets an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.filters.Filter
- Returns:
- an enumeration of all the available options.
-
getOptions
public String[] getOptions()
returns the options of the current setup.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.filters.Filter
- Returns:
- the current options
-
setOptions
public void setOptions(String[] options) throws Exception
Parses the options for this object.
Valid options are:
-D Turns on output of debugging information.
-first <att name> The name of the first attribute. (default: att1)
-second <att name> The name of the second attribute. (default: att2)
-remove-chars <chars> The characters to remove from the start/end of the generated name for the merged attribute. (default: -_.)
-merged-index <position> The new position for the merged attribute. Empty string is default position, i.e., either the position of the first or second attribute (whichever comes first) (default: )
-differ <MISSING|AVERAGE|FIRST|SECOND> The strategy to apply in case the values of the two attributes differ. (default: MISSING)
-one-missing <MISSING|PRESENT> The strategy to apply in case one of the two values is missing. (default: MISSING)
- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.filters.Filter
- Parameters:
options
- the options to use- Throws:
Exception
- if the option setting fails
-
setFirstAttribute
public void setFirstAttribute(String value)
Sets the name of the first attribute.- Parameters:
value
- the name
-
getFirstAttribute
public String getFirstAttribute()
Gets the name of the first attribute.- Returns:
- the name
-
firstAttributeTipText
public String firstAttributeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSecondAttribute
public void setSecondAttribute(String value)
Sets the name of the second attribute.- Parameters:
value
- the name
-
getSecondAttribute
public String getSecondAttribute()
Gets the name of the second attribute.- Returns:
- the name
-
secondAttributeTipText
public String secondAttributeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMergedIndex
public void setMergedIndex(String value)
Sets the position for the merged attribute.- Parameters:
value
- the position, empty string for default
-
getMergedIndex
public String getMergedIndex()
Gets the position for the merged attribute.- Returns:
- the position, empty string for default
-
mergedIndexTipText
public String mergedIndexTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRemoveChars
public void setRemoveChars(String value)
Sets the characters to remove from start/end of the generated name.- Parameters:
value
- the characters
-
getRemoveChars
public String getRemoveChars()
Gets the characters to remove from start/end of the generated name.- Returns:
- the characters
-
removeCharsTipText
public String removeCharsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDiffer
public void setDiffer(weka.core.SelectedTag value)
Sets the type of strategy to apply if the two values differ.- Parameters:
value
- the strategy
-
getDiffer
public weka.core.SelectedTag getDiffer()
Gets the type of strategy to apply if the two values differ.- Returns:
- the strategy
-
differTipText
public String differTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOneMissing
public void setOneMissing(weka.core.SelectedTag value)
Sets the type of strategy to apply if one of the values is missing.- Parameters:
value
- the strategy
-
getOneMissing
public weka.core.SelectedTag getOneMissing()
Gets the type of strategy to apply if one of the values is missing.- Returns:
- the strategy
-
oneMissingTipText
public String oneMissingTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
commonSubsequence
protected String commonSubsequence(String s1, String s2, boolean forward)
Determines the common subsequence of the two strings.- Parameters:
s1
- the first strings2
- the second stringforward
- if false, the strings are search from back to front
-
determineOutputFormat
protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat) throws Exception
Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().- Specified by:
determineOutputFormat
in classweka.filters.SimpleFilter
- Parameters:
inputFormat
- the input format to base the output format on- Returns:
- the output format
- Throws:
Exception
- in case the determination goes wrong- See Also:
SimpleBatchFilter.hasImmediateOutputFormat()
,SimpleBatchFilter.batchFinished()
-
getCapabilities
public weka.core.Capabilities getCapabilities()
Returns the Capabilities of this filter.- Specified by:
getCapabilities
in interfaceweka.core.CapabilitiesHandler
- Overrides:
getCapabilities
in classweka.filters.Filter
- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
process
protected weka.core.Instances process(weka.core.Instances instances) throws Exception
Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().- Specified by:
process
in classweka.filters.SimpleFilter
- Parameters:
instances
- the data to process- Returns:
- the modified data
- Throws:
Exception
- in case the processing goes wrong- See Also:
SimpleBatchFilter.batchFinished()
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceweka.core.RevisionHandler
- Overrides:
getRevision
in classweka.filters.Filter
- Returns:
- the revision
-
main
public static void main(String[] args)
runs the filter with the given arguments.- Parameters:
args
- the commandline arguments
-
-