Class RemoveDuplicates

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.Randomizable, weka.core.RevisionHandler, weka.filters.UnsupervisedFilter

    public class RemoveDuplicates
    extends weka.filters.SimpleBatchFilter
    implements weka.filters.UnsupervisedFilter, weka.core.Randomizable
    Removes all duplicate instances.

    Valid options are:

     -include-class
      Whether to include the class attribute in the comparison as well.
     
     -randomize
      Whether to randomize the data after the removal process.
     
     -S <int>
      Specifies the seed value for randomization.
      (default: 42)
     
    Version:
    $Revision$
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected boolean m_IncludeClass
      whether to take the class into account.
      protected boolean m_Randomize
      whether to randomize the data after the removal.
      protected int m_Seed
      the seed value for the randomization.
      • Fields inherited from class weka.filters.Filter

        m_Debug, m_DoNotCheckCapabilities, m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
      Determines the output format based on the input format and returns this.
      weka.core.Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      boolean getIncludeClass()
      Returns whether to include the class attribute in the comparison.
      String[] getOptions()
      Gets the current settings of the filter.
      boolean getRandomize()
      Returns whether to include the class attribute in the comparison.
      String getRevision()
      Returns the revision string.
      int getSeed()
      Gets the seed for the random number generations
      String globalInfo()
      Returns a string describing this classifier.
      String includeClassTipText()
      Returns the tip text for this property.
      Enumeration listOptions()
      Returns an enumeration describing the available options.
      static void main​(String[] args)
      Main method for running this filter.
      protected weka.core.Instances process​(weka.core.Instances instances)
      Processes the given data (may change the provided dataset) and returns the modified version.
      String randomizeTipText()
      Returns the tip text for this property.
      String seedTipText()
      Returns the tip text for this property.
      void setIncludeClass​(boolean value)
      Sets whether to include the class attribute in the comparison.
      void setOptions​(String[] options)
      Parses a given list of options.
      void setRandomize​(boolean value)
      Sets whether to include the class attribute in the comparison.
      void setSeed​(int value)
      Set the seed for random number generation.
      • Methods inherited from class weka.filters.SimpleBatchFilter

        allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input, input
      • Methods inherited from class weka.filters.SimpleFilter

        reset, setInputFormat
      • Methods inherited from class weka.filters.Filter

        batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
    • Field Detail

      • m_IncludeClass

        protected boolean m_IncludeClass
        whether to take the class into account.
      • m_Randomize

        protected boolean m_Randomize
        whether to randomize the data after the removal.
      • m_Seed

        protected int m_Seed
        the seed value for the randomization.
    • Constructor Detail

      • RemoveDuplicates

        public RemoveDuplicates()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this classifier.
        Specified by:
        globalInfo in class weka.filters.SimpleFilter
        Returns:
        a description of the classifier suitable for displaying in the explorer/experimenter gui
      • listOptions

        public Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses a given list of options.

        Valid options are:

         -include-class
          Whether to include the class attribute in the comparison as well.
         
         -randomize
          Whether to randomize the data after the removal process.
         
         -S <int>
          Specifies the seed value for randomization.
          (default: 42)
         
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the list of options as an array of string.s
        Throws:
        Exception - if an option is not supported.
      • getOptions

        public String[] getOptions()
        Gets the current settings of the filter.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        an array of strings suitable for passing to setOptions.
      • setIncludeClass

        public void setIncludeClass​(boolean value)
        Sets whether to include the class attribute in the comparison.
        Parameters:
        value - if true the class attribute gets included
      • getIncludeClass

        public boolean getIncludeClass()
        Returns whether to include the class attribute in the comparison.
        Returns:
        true if the class attribute is included
      • includeClassTipText

        public String includeClassTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setRandomize

        public void setRandomize​(boolean value)
        Sets whether to include the class attribute in the comparison.
        Parameters:
        value - if true the class attribute gets included
      • getRandomize

        public boolean getRandomize()
        Returns whether to include the class attribute in the comparison.
        Returns:
        true if the class attribute is included
      • randomizeTipText

        public String randomizeTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setSeed

        public void setSeed​(int value)
        Set the seed for random number generation.
        Specified by:
        setSeed in interface weka.core.Randomizable
        Parameters:
        value - the seed
      • getSeed

        public int getSeed()
        Gets the seed for the random number generations
        Specified by:
        getSeed in interface weka.core.Randomizable
        Returns:
        the seed for the random number generation
      • seedTipText

        public String seedTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns the Capabilities of this filter.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
        Returns:
        the capabilities of this object
        See Also:
        Capabilities
      • determineOutputFormat

        protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
                                                     throws Exception
        Determines the output format based on the input format and returns this.
        Specified by:
        determineOutputFormat in class weka.filters.SimpleFilter
        Parameters:
        inputFormat - the input format to base the output format on
        Returns:
        the output format
        Throws:
        Exception - in case the determination goes wrong
      • process

        protected weka.core.Instances process​(weka.core.Instances instances)
                                       throws Exception
        Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().
        Specified by:
        process in class weka.filters.SimpleFilter
        Parameters:
        instances - the data to process
        Returns:
        the modified data
        Throws:
        Exception - in case the processing goes wrong
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.filters.Filter
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        Main method for running this filter.
        Parameters:
        args - should contain arguments to the filter: use -h for help