Class RemoveDuplicates
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.SimpleFilter
-
- weka.filters.SimpleBatchFilter
-
- weka.filters.unsupervised.instance.RemoveDuplicates
-
- All Implemented Interfaces:
Serializable
,weka.core.CapabilitiesHandler
,weka.core.CapabilitiesIgnorer
,weka.core.CommandlineRunnable
,weka.core.OptionHandler
,weka.core.Randomizable
,weka.core.RevisionHandler
,weka.filters.UnsupervisedFilter
public class RemoveDuplicates extends weka.filters.SimpleBatchFilter implements weka.filters.UnsupervisedFilter, weka.core.Randomizable
Removes all duplicate instances.
Valid options are:
-include-class Whether to include the class attribute in the comparison as well.
-randomize Whether to randomize the data after the removal process.
-S <int> Specifies the seed value for randomization. (default: 42)
- Version:
- $Revision$
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
m_IncludeClass
whether to take the class into account.protected boolean
m_Randomize
whether to randomize the data after the removal.protected int
m_Seed
the seed value for the randomization.
-
Constructor Summary
Constructors Constructor Description RemoveDuplicates()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected weka.core.Instances
determineOutputFormat(weka.core.Instances inputFormat)
Determines the output format based on the input format and returns this.weka.core.Capabilities
getCapabilities()
Returns the Capabilities of this filter.boolean
getIncludeClass()
Returns whether to include the class attribute in the comparison.String[]
getOptions()
Gets the current settings of the filter.boolean
getRandomize()
Returns whether to include the class attribute in the comparison.String
getRevision()
Returns the revision string.int
getSeed()
Gets the seed for the random number generationsString
globalInfo()
Returns a string describing this classifier.String
includeClassTipText()
Returns the tip text for this property.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(String[] args)
Main method for running this filter.protected weka.core.Instances
process(weka.core.Instances instances)
Processes the given data (may change the provided dataset) and returns the modified version.String
randomizeTipText()
Returns the tip text for this property.String
seedTipText()
Returns the tip text for this property.void
setIncludeClass(boolean value)
Sets whether to include the class attribute in the comparison.void
setOptions(String[] options)
Parses a given list of options.void
setRandomize(boolean value)
Sets whether to include the class attribute in the comparison.void
setSeed(int value)
Set the seed for random number generation.-
Methods inherited from class weka.filters.SimpleBatchFilter
allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input
-
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
-
-
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing this classifier.- Specified by:
globalInfo
in classweka.filters.SimpleFilter
- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.filters.Filter
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Parses a given list of options.
Valid options are:
-include-class Whether to include the class attribute in the comparison as well.
-randomize Whether to randomize the data after the removal process.
-S <int> Specifies the seed value for randomization. (default: 42)
- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.filters.Filter
- Parameters:
options
- the list of options as an array of string.s- Throws:
Exception
- if an option is not supported.
-
getOptions
public String[] getOptions()
Gets the current settings of the filter.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.filters.Filter
- Returns:
- an array of strings suitable for passing to setOptions.
-
setIncludeClass
public void setIncludeClass(boolean value)
Sets whether to include the class attribute in the comparison.- Parameters:
value
- if true the class attribute gets included
-
getIncludeClass
public boolean getIncludeClass()
Returns whether to include the class attribute in the comparison.- Returns:
- true if the class attribute is included
-
includeClassTipText
public String includeClassTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRandomize
public void setRandomize(boolean value)
Sets whether to include the class attribute in the comparison.- Parameters:
value
- if true the class attribute gets included
-
getRandomize
public boolean getRandomize()
Returns whether to include the class attribute in the comparison.- Returns:
- true if the class attribute is included
-
randomizeTipText
public String randomizeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSeed
public void setSeed(int value)
Set the seed for random number generation.- Specified by:
setSeed
in interfaceweka.core.Randomizable
- Parameters:
value
- the seed
-
getSeed
public int getSeed()
Gets the seed for the random number generations- Specified by:
getSeed
in interfaceweka.core.Randomizable
- Returns:
- the seed for the random number generation
-
seedTipText
public String seedTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getCapabilities
public weka.core.Capabilities getCapabilities()
Returns the Capabilities of this filter.- Specified by:
getCapabilities
in interfaceweka.core.CapabilitiesHandler
- Overrides:
getCapabilities
in classweka.filters.Filter
- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
determineOutputFormat
protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat) throws Exception
Determines the output format based on the input format and returns this.- Specified by:
determineOutputFormat
in classweka.filters.SimpleFilter
- Parameters:
inputFormat
- the input format to base the output format on- Returns:
- the output format
- Throws:
Exception
- in case the determination goes wrong
-
process
protected weka.core.Instances process(weka.core.Instances instances) throws Exception
Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().- Specified by:
process
in classweka.filters.SimpleFilter
- Parameters:
instances
- the data to process- Returns:
- the modified data
- Throws:
Exception
- in case the processing goes wrong
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceweka.core.RevisionHandler
- Overrides:
getRevision
in classweka.filters.Filter
- Returns:
- the revision
-
main
public static void main(String[] args)
Main method for running this filter.- Parameters:
args
- should contain arguments to the filter: use -h for help
-
-