weka.filters.unsupervised.instance
Class RemoveDuplicates

java.lang.Object
  extended by weka.filters.Filter
      extended by weka.filters.SimpleFilter
          extended by weka.filters.SimpleBatchFilter
              extended by weka.filters.unsupervised.instance.RemoveDuplicates
All Implemented Interfaces:
Serializable, weka.core.CapabilitiesHandler, weka.core.OptionHandler, weka.core.Randomizable, weka.core.RevisionHandler, weka.filters.UnsupervisedFilter

public class RemoveDuplicates
extends weka.filters.SimpleBatchFilter
implements weka.filters.UnsupervisedFilter, weka.core.Randomizable

Removes all duplicate instances.

Valid options are:

 -include-class
  Whether to include the class attribute in the comparison as well.
 
 -randomize
  Whether to randomize the data after the removal process.
 
 -S <int>
  Specifies the seed value for randomization.
  (default: 42)
 

Version:
$Revision: 4521 $
Author:
fracpete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Field Summary
protected  boolean m_IncludeClass
          whether to take the class into account.
protected  boolean m_Randomize
          whether to randomize the data after the removal.
protected  int m_Seed
          the seed value for the randomization.
 
Fields inherited from class weka.filters.SimpleFilter
m_Debug
 
Fields inherited from class weka.filters.Filter
m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
 
Constructor Summary
RemoveDuplicates()
           
 
Method Summary
protected  weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat)
          Determines the output format based on the input format and returns this.
 weka.core.Capabilities getCapabilities()
          Returns the Capabilities of this filter.
 boolean getIncludeClass()
          Returns whether to include the class attribute in the comparison.
 String[] getOptions()
          Gets the current settings of the filter.
 boolean getRandomize()
          Returns whether to include the class attribute in the comparison.
 String getRevision()
          Returns the revision string.
 int getSeed()
          Gets the seed for the random number generations
 String globalInfo()
          Returns a string describing this classifier.
 String includeClassTipText()
          Returns the tip text for this property.
 Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(String[] args)
          Main method for running this filter.
protected  weka.core.Instances process(weka.core.Instances instances)
          Processes the given data (may change the provided dataset) and returns the modified version.
 String randomizeTipText()
          Returns the tip text for this property.
 String seedTipText()
          Returns the tip text for this property.
 void setIncludeClass(boolean value)
          Sets whether to include the class attribute in the comparison.
 void setOptions(String[] options)
          Parses a given list of options.
 void setRandomize(boolean value)
          Sets whether to include the class attribute in the comparison.
 void setSeed(int value)
          Set the seed for random number generation.
 
Methods inherited from class weka.filters.SimpleBatchFilter
batchFinished, hasImmediateOutputFormat, input
 
Methods inherited from class weka.filters.SimpleFilter
debugTipText, getDebug, reset, setDebug, setInputFormat
 
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, filterFile, flushInput, getCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, push, resetQueue, runFilter, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_IncludeClass

protected boolean m_IncludeClass
whether to take the class into account.


m_Randomize

protected boolean m_Randomize
whether to randomize the data after the removal.


m_Seed

protected int m_Seed
the seed value for the randomization.

Constructor Detail

RemoveDuplicates

public RemoveDuplicates()
Method Detail

globalInfo

public String globalInfo()
Returns a string describing this classifier.

Specified by:
globalInfo in class weka.filters.SimpleFilter
Returns:
a description of the classifier suitable for displaying in the explorer/experimenter gui

listOptions

public Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface weka.core.OptionHandler
Overrides:
listOptions in class weka.filters.SimpleFilter
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(String[] options)
                throws Exception
Parses a given list of options.

Valid options are:

 -include-class
  Whether to include the class attribute in the comparison as well.
 
 -randomize
  Whether to randomize the data after the removal process.
 
 -S <int>
  Specifies the seed value for randomization.
  (default: 42)
 

Specified by:
setOptions in interface weka.core.OptionHandler
Overrides:
setOptions in class weka.filters.SimpleFilter
Parameters:
options - the list of options as an array of string.s
Throws:
Exception - if an option is not supported.

getOptions

public String[] getOptions()
Gets the current settings of the filter.

Specified by:
getOptions in interface weka.core.OptionHandler
Overrides:
getOptions in class weka.filters.SimpleFilter
Returns:
an array of strings suitable for passing to setOptions.

setIncludeClass

public void setIncludeClass(boolean value)
Sets whether to include the class attribute in the comparison.

Parameters:
value - if true the class attribute gets included

getIncludeClass

public boolean getIncludeClass()
Returns whether to include the class attribute in the comparison.

Returns:
true if the class attribute is included

includeClassTipText

public String includeClassTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setRandomize

public void setRandomize(boolean value)
Sets whether to include the class attribute in the comparison.

Parameters:
value - if true the class attribute gets included

getRandomize

public boolean getRandomize()
Returns whether to include the class attribute in the comparison.

Returns:
true if the class attribute is included

randomizeTipText

public String randomizeTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSeed

public void setSeed(int value)
Set the seed for random number generation.

Specified by:
setSeed in interface weka.core.Randomizable
Parameters:
value - the seed

getSeed

public int getSeed()
Gets the seed for the random number generations

Specified by:
getSeed in interface weka.core.Randomizable
Returns:
the seed for the random number generation

seedTipText

public String seedTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getCapabilities

public weka.core.Capabilities getCapabilities()
Returns the Capabilities of this filter.

Specified by:
getCapabilities in interface weka.core.CapabilitiesHandler
Overrides:
getCapabilities in class weka.filters.Filter
Returns:
the capabilities of this object
See Also:
Capabilities

determineOutputFormat

protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat)
                                             throws Exception
Determines the output format based on the input format and returns this.

Specified by:
determineOutputFormat in class weka.filters.SimpleFilter
Parameters:
inputFormat - the input format to base the output format on
Returns:
the output format
Throws:
Exception - in case the determination goes wrong

process

protected weka.core.Instances process(weka.core.Instances instances)
                               throws Exception
Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().

Specified by:
process in class weka.filters.SimpleFilter
Parameters:
instances - the data to process
Returns:
the modified data
Throws:
Exception - in case the processing goes wrong

getRevision

public String getRevision()
Returns the revision string.

Specified by:
getRevision in interface weka.core.RevisionHandler
Overrides:
getRevision in class weka.filters.Filter
Returns:
the revision

main

public static void main(String[] args)
Main method for running this filter.

Parameters:
args - should contain arguments to the filter: use -h for help


Copyright © 2012 University of Waikato, Hamilton, NZ. All Rights Reserved.