Class MergeManyAttributes
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.SimpleFilter
-
- weka.filters.SimpleBatchFilter
-
- weka.filters.unsupervised.attribute.MergeManyAttributes
-
- All Implemented Interfaces:
Serializable,weka.core.CapabilitiesHandler,weka.core.CapabilitiesIgnorer,weka.core.CommandlineRunnable,weka.core.OptionHandler,weka.core.RevisionHandler,weka.filters.UnsupervisedFilter
public class MergeManyAttributes extends weka.filters.SimpleBatchFilter implements weka.filters.UnsupervisedFilterMerges two or more attributes, offers various strategies if values differ or not present.
Uses the common subsequence (either from start or end) of the attributes as name of the merged attribute, otherwise the concatenation of them (separated by '-'). If this new name should already be present, then a number is appended to the name to make it unique.
The merged attribute can either be left at the default position (whichever one of the attributes that comes first) or moved to a specific one.
If one of the attributes to be merged is the current class attribute, the newly created merged attribute will become the new class attribute.
Valid options are:
-D Turns on output of debugging information.
-att-name <att name> The name of the attribute, can be supplied multiple times.
-remove-chars <chars> The characters to remove from the start/end of the generated name for the merged attribute. (default: -_.)
-merged-index <position> The new position for the merged attribute. Empty string is default position, i.e., either the position of the first or second attribute (whichever comes first) (default: )
-differ <MISSING|AVERAGE> The strategy to apply in case the values of the attributes differ. (default: MISSING)
-one-missing <MISSING|PRESENT> The strategy to apply in case one of the values is missing. (default: MISSING)
- Version:
- $Revision$
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static StringDEFAULT_REMOVE_CHARScharacters to remove from start/end of the merged name.protected BaseString[]m_AttributeNamesthe attribute names.protected intm_Differhow to handle differing values.protected Stringm_Mergedthe name of the merged attribute.protected weka.core.SingleIndexm_MergedIndexthe position for the merged attribute (empty = leave at default position).protected intm_OneMissingwhat to do if one value is missing.protected Stringm_RemoveCharsthe characters to remove from the merged name (start/end).static intONEMISSING_MISSINGwhat to do if one is missing: missing.static intONEMISSING_USE_FIRST_PRESENTwhat to do if one is missing: use first present value.static weka.core.Tag[]TAGS_ONEMISSINGstatic weka.core.Tag[]TAGS_VALUESDIFFERthe types of how to handle differing values.static intVALUESDIFFER_AVERAGEhow to handle differing values: average.static intVALUESDIFFER_MISSINGhow to handle differing values: missing.
-
Constructor Summary
Constructors Constructor Description MergeManyAttributes()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description StringattributeNamesTipText()Returns the tip text for this property.protected StringcommonSubsequence(String s1, String s2, boolean forward)Determines the common subsequence of the two strings.protected weka.core.InstancesdetermineOutputFormat(weka.core.Instances inputFormat)Determines the output format based on the input format and returns this.StringdifferTipText()Returns the tip text for this property.BaseString[]getAttributeNames()Gets the names of the attributes.weka.core.CapabilitiesgetCapabilities()Returns the Capabilities of this filter.weka.core.SelectedTaggetDiffer()Gets the type of strategy to apply if the two values differ.StringgetMergedIndex()Gets the position for the merged attribute.weka.core.SelectedTaggetOneMissing()Gets the type of strategy to apply if one of the values is missing.String[]getOptions()returns the options of the current setup.StringgetRemoveChars()Gets the characters to remove from start/end of the generated name.StringgetRevision()Returns the revision string.StringglobalInfo()Returns a string describing this classifier.EnumerationlistOptions()Gets an enumeration describing the available options.static voidmain(String[] args)runs the filter with the given arguments.StringmergedIndexTipText()Returns the tip text for this property.StringoneMissingTipText()Returns the tip text for this property.protected weka.core.Instancesprocess(weka.core.Instances instances)Processes the given data (may change the provided dataset) and returns the modified version.StringremoveCharsTipText()Returns the tip text for this property.voidsetAttributeNames(BaseString[] value)Sets the names of the attributes.voidsetDiffer(weka.core.SelectedTag value)Sets the type of strategy to apply if the two values differ.voidsetMergedIndex(String value)Sets the position for the merged attribute.voidsetOneMissing(weka.core.SelectedTag value)Sets the type of strategy to apply if one of the values is missing.voidsetOptions(String[] options)Parses the options for this object.voidsetRemoveChars(String value)Sets the characters to remove from start/end of the generated name.-
Methods inherited from class weka.filters.SimpleBatchFilter
allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input, input
-
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
-
-
-
-
Field Detail
-
VALUESDIFFER_MISSING
public static final int VALUESDIFFER_MISSING
how to handle differing values: missing.- See Also:
- Constant Field Values
-
VALUESDIFFER_AVERAGE
public static final int VALUESDIFFER_AVERAGE
how to handle differing values: average.- See Also:
- Constant Field Values
-
TAGS_VALUESDIFFER
public static final weka.core.Tag[] TAGS_VALUESDIFFER
the types of how to handle differing values.
-
ONEMISSING_MISSING
public static final int ONEMISSING_MISSING
what to do if one is missing: missing.- See Also:
- Constant Field Values
-
ONEMISSING_USE_FIRST_PRESENT
public static final int ONEMISSING_USE_FIRST_PRESENT
what to do if one is missing: use first present value.- See Also:
- Constant Field Values
-
TAGS_ONEMISSING
public static final weka.core.Tag[] TAGS_ONEMISSING
-
DEFAULT_REMOVE_CHARS
public static final String DEFAULT_REMOVE_CHARS
characters to remove from start/end of the merged name.- See Also:
- Constant Field Values
-
m_AttributeNames
protected BaseString[] m_AttributeNames
the attribute names.
-
m_Differ
protected int m_Differ
how to handle differing values.
-
m_OneMissing
protected int m_OneMissing
what to do if one value is missing.
-
m_RemoveChars
protected String m_RemoveChars
the characters to remove from the merged name (start/end).
-
m_MergedIndex
protected weka.core.SingleIndex m_MergedIndex
the position for the merged attribute (empty = leave at default position).
-
m_Merged
protected String m_Merged
the name of the merged attribute.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing this classifier.- Specified by:
globalInfoin classweka.filters.SimpleFilter- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Gets an enumeration describing the available options.- Specified by:
listOptionsin interfaceweka.core.OptionHandler- Overrides:
listOptionsin classweka.filters.Filter- Returns:
- an enumeration of all the available options.
-
getOptions
public String[] getOptions()
returns the options of the current setup.- Specified by:
getOptionsin interfaceweka.core.OptionHandler- Overrides:
getOptionsin classweka.filters.Filter- Returns:
- the current options
-
setOptions
public void setOptions(String[] options) throws Exception
Parses the options for this object.
Valid options are:
-D Turns on output of debugging information.
-att-name <att name> The name of the attribute, can be supplied multiple times.
-remove-chars <chars> The characters to remove from the start/end of the generated name for the merged attribute. (default: -_.)
-merged-index <position> The new position for the merged attribute. Empty string is default position, i.e., either the position of the first or second attribute (whichever comes first) (default: )
-differ <MISSING|AVERAGE> The strategy to apply in case the values of the attributes differ. (default: MISSING)
-one-missing <MISSING|PRESENT> The strategy to apply in case one of the values is missing. (default: MISSING)
- Specified by:
setOptionsin interfaceweka.core.OptionHandler- Overrides:
setOptionsin classweka.filters.Filter- Parameters:
options- the options to use- Throws:
Exception- if the option setting fails
-
setAttributeNames
public void setAttributeNames(BaseString[] value)
Sets the names of the attributes.- Parameters:
value- the names
-
getAttributeNames
public BaseString[] getAttributeNames()
Gets the names of the attributes.- Returns:
- the names
-
attributeNamesTipText
public String attributeNamesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMergedIndex
public void setMergedIndex(String value)
Sets the position for the merged attribute.- Parameters:
value- the position, empty string for default
-
getMergedIndex
public String getMergedIndex()
Gets the position for the merged attribute.- Returns:
- the position, empty string for default
-
mergedIndexTipText
public String mergedIndexTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRemoveChars
public void setRemoveChars(String value)
Sets the characters to remove from start/end of the generated name.- Parameters:
value- the characters
-
getRemoveChars
public String getRemoveChars()
Gets the characters to remove from start/end of the generated name.- Returns:
- the characters
-
removeCharsTipText
public String removeCharsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDiffer
public void setDiffer(weka.core.SelectedTag value)
Sets the type of strategy to apply if the two values differ.- Parameters:
value- the strategy
-
getDiffer
public weka.core.SelectedTag getDiffer()
Gets the type of strategy to apply if the two values differ.- Returns:
- the strategy
-
differTipText
public String differTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOneMissing
public void setOneMissing(weka.core.SelectedTag value)
Sets the type of strategy to apply if one of the values is missing.- Parameters:
value- the strategy
-
getOneMissing
public weka.core.SelectedTag getOneMissing()
Gets the type of strategy to apply if one of the values is missing.- Returns:
- the strategy
-
oneMissingTipText
public String oneMissingTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
commonSubsequence
protected String commonSubsequence(String s1, String s2, boolean forward)
Determines the common subsequence of the two strings.- Parameters:
s1- the first strings2- the second stringforward- if false, the strings are search from back to front
-
determineOutputFormat
protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat) throws ExceptionDetermines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().- Specified by:
determineOutputFormatin classweka.filters.SimpleFilter- Parameters:
inputFormat- the input format to base the output format on- Returns:
- the output format
- Throws:
Exception- in case the determination goes wrong- See Also:
SimpleBatchFilter.hasImmediateOutputFormat(),SimpleBatchFilter.batchFinished()
-
getCapabilities
public weka.core.Capabilities getCapabilities()
Returns the Capabilities of this filter.- Specified by:
getCapabilitiesin interfaceweka.core.CapabilitiesHandler- Overrides:
getCapabilitiesin classweka.filters.Filter- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
process
protected weka.core.Instances process(weka.core.Instances instances) throws ExceptionProcesses the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().- Specified by:
processin classweka.filters.SimpleFilter- Parameters:
instances- the data to process- Returns:
- the modified data
- Throws:
Exception- in case the processing goes wrong- See Also:
SimpleBatchFilter.batchFinished()
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevisionin interfaceweka.core.RevisionHandler- Overrides:
getRevisionin classweka.filters.Filter- Returns:
- the revision
-
main
public static void main(String[] args)
runs the filter with the given arguments.- Parameters:
args- the commandline arguments
-
-