Class RemoveMisclassifiedAbs

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.RevisionHandler, weka.filters.UnsupervisedFilter

    public class RemoveMisclassifiedAbs
    extends weka.filters.Filter
    implements weka.filters.UnsupervisedFilter, weka.core.OptionHandler
    A filter that removes instances which are incorrectly classified. Useful for removing outliers.

    Valid options are:

     -W <classifier specification>
      Full class name of classifier to use, followed
      by scheme options. eg:
       "weka.classifiers.bayes.NaiveBayes -D"
      (default: weka.classifiers.rules.ZeroR)
     -C <class index>
      Attribute on which misclassifications are based.
      If < 0 will use any current set class or default to the last attribute.
     -F <number of folds>
      The number of folds to use for cross-validation cleansing.
      (<2 = no cross-validation - default).
     -T <threshold>
      Threshold for the max error when predicting numeric class.
      (Value should be >= 0, default = 0.1).
     -I
      The maximum number of cleansing iterations to perform.
      (<1 = until fully cleansed - default)
     -V
      Invert the match so that correctly classified instances are discarded.
     
    Version:
    $Revision: 4584 $
    Author:
    Richard Kirkby (rkirkby@cs.waikato.ac.nz), Malcolm Ware (mfw4@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected int m_classIndex
      The attribute to treat as the class for purposes of cleansing.
      protected weka.classifiers.Classifier m_cleansingClassifier
      The classifier used to do the cleansing
      protected boolean m_firstBatchFinished
      Have we processed the first batch (i.e.
      protected boolean m_invertMatching
      Whether to invert the match so the correctly classified instances are discarded
      protected double m_numericClassifyThreshold
      The threshold for deciding when a numeric value is correctly classified
      protected double m_numericClassifyThresholdAbs
      if Absolute error is less than this, then we're ok
      protected int m_numOfCleansingIterations
      The maximum number of cleansing iterations to perform (<1 = until fully cleansed)
      protected int m_numOfCrossValidationFolds
      The number of cross validation folds to perform (<2 = no cross validation)
      • Fields inherited from class weka.filters.Filter

        m_Debug, m_DoNotCheckCapabilities, m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      String absErrTipText()
      Returns the tip text for this property
      boolean batchFinished()
      Signify that this batch of input to the filter is finished.
      String classifierTipText()
      Returns the tip text for this property
      String classIndexTipText()
      Returns the tip text for this property
      double getAbsErr()
      Gets the threshold for the max error when predicting a numeric class.
      weka.core.Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      weka.classifiers.Classifier getClassifier()
      Gets the classifier used by the filter.
      protected String getClassifierSpec()
      Gets the classifier specification string, which contains the class name of the classifier and any options to the classifier.
      int getClassIndex()
      Gets the attribute on which misclassifications are based.
      boolean getInvert()
      Get whether selection is inverted.
      int getMaxIterations()
      Gets the maximum number of cleansing iterations performed
      int getNumFolds()
      Gets the number of cross-validation folds used by the filter.
      String[] getOptions()
      Gets the current settings of the filter.
      String getRevision()
      Returns the revision string.
      double getThreshold()
      Gets the threshold for the max error when predicting a numeric class.
      String globalInfo()
      Returns a string describing this filter
      boolean input​(weka.core.Instance instance)
      Input an instance for filtering.
      String invertTipText()
      Returns the tip text for this property
      Enumeration listOptions()
      Returns an enumeration describing the available options.
      static void main​(String[] argv)
      Main method for testing this class.
      String maxIterationsTipText()
      Returns the tip text for this property
      String numFoldsTipText()
      Returns the tip text for this property
      void setAbsErr​(double threshold)
      Sets the threshold for the max error when predicting a numeric class.
      void setClassifier​(weka.classifiers.Classifier classifier)
      Sets the classifier to classify instances with.
      void setClassIndex​(int classIndex)
      Sets the attribute on which misclassifications are based.
      boolean setInputFormat​(weka.core.Instances instanceInfo)
      Sets the format of the input instances.
      void setInvert​(boolean invert)
      Set whether selection is inverted.
      void setMaxIterations​(int iterations)
      Sets the maximum number of cleansing iterations to perform - < 1 means go until fully cleansed
      void setNumFolds​(int numOfFolds)
      Sets the number of cross-validation folds to use - < 2 means no cross-validation.
      void setOptions​(String[] options)
      Parses a given list of options.
      void setThreshold​(double threshold)
      Sets the threshold for the max error when predicting a numeric class.
      String thresholdTipText()
      Returns the tip text for this property.
      • Methods inherited from class weka.filters.Filter

        batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
    • Field Detail

      • m_cleansingClassifier

        protected weka.classifiers.Classifier m_cleansingClassifier
        The classifier used to do the cleansing
      • m_classIndex

        protected int m_classIndex
        The attribute to treat as the class for purposes of cleansing.
      • m_numOfCrossValidationFolds

        protected int m_numOfCrossValidationFolds
        The number of cross validation folds to perform (<2 = no cross validation)
      • m_numOfCleansingIterations

        protected int m_numOfCleansingIterations
        The maximum number of cleansing iterations to perform (<1 = until fully cleansed)
      • m_numericClassifyThreshold

        protected double m_numericClassifyThreshold
        The threshold for deciding when a numeric value is correctly classified
      • m_numericClassifyThresholdAbs

        protected double m_numericClassifyThresholdAbs
        if Absolute error is less than this, then we're ok
      • m_invertMatching

        protected boolean m_invertMatching
        Whether to invert the match so the correctly classified instances are discarded
      • m_firstBatchFinished

        protected boolean m_firstBatchFinished
        Have we processed the first batch (i.e. training data)?
    • Constructor Detail

      • RemoveMisclassifiedAbs

        public RemoveMisclassifiedAbs()
    • Method Detail

      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns the Capabilities of this filter.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
        Returns:
        the capabilities of this object
        See Also:
        Capabilities
      • setInputFormat

        public boolean setInputFormat​(weka.core.Instances instanceInfo)
                               throws Exception
        Sets the format of the input instances.
        Overrides:
        setInputFormat in class weka.filters.Filter
        Parameters:
        instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
        Returns:
        true if the outputFormat may be collected immediately
        Throws:
        Exception - if the inputFormat can't be set successfully
      • input

        public boolean input​(weka.core.Instance instance)
                      throws Exception
        Input an instance for filtering.
        Overrides:
        input in class weka.filters.Filter
        Parameters:
        instance - the input instance
        Returns:
        true if the filtered instance may now be collected with output().
        Throws:
        NullPointerException - if the input format has not been defined.
        Exception - if the input instance was not of the correct format or if there was a problem with the filtering.
      • batchFinished

        public boolean batchFinished()
                              throws Exception
        Signify that this batch of input to the filter is finished.
        Overrides:
        batchFinished in class weka.filters.Filter
        Returns:
        true if there are instances pending output
        Throws:
        IllegalStateException - if no input structure has been defined
        Exception
      • listOptions

        public Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses a given list of options.

        Valid options are:

         -W <classifier specification>
          Full class name of classifier to use, followed
          by scheme options. eg:
           "weka.classifiers.bayes.NaiveBayes -D"
          (default: weka.classifiers.rules.ZeroR)
         -C <class index>
          Attribute on which misclassifications are based.
          If < 0 will use any current set class or default to the last attribute.
         -F <number of folds>
          The number of folds to use for cross-validation cleansing.
          (<2 = no cross-validation - default).
         -T <threshold>
          Threshold for the max error when predicting numeric class.
          (Value should be >= 0, default = 0.1).
         -I
          The maximum number of cleansing iterations to perform.
          (<1 = until fully cleansed - default)
         -V
          Invert the match so that correctly classified instances are discarded.
         
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
      • getOptions

        public String[] getOptions()
        Gets the current settings of the filter.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        an array of strings suitable for passing to setOptions
      • globalInfo

        public String globalInfo()
        Returns a string describing this filter
        Returns:
        a description of the filter suitable for displaying in the explorer/experimenter gui
      • classifierTipText

        public String classifierTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setClassifier

        public void setClassifier​(weka.classifiers.Classifier classifier)
        Sets the classifier to classify instances with.
        Parameters:
        classifier - The classifier to be used (with its options set).
      • getClassifier

        public weka.classifiers.Classifier getClassifier()
        Gets the classifier used by the filter.
        Returns:
        The classifier to be used.
      • getClassifierSpec

        protected String getClassifierSpec()
        Gets the classifier specification string, which contains the class name of the classifier and any options to the classifier.
        Returns:
        the classifier string.
      • classIndexTipText

        public String classIndexTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setClassIndex

        public void setClassIndex​(int classIndex)
        Sets the attribute on which misclassifications are based. If < 0 will use any current set class or default to the last attribute.
        Parameters:
        classIndex - the class index.
      • getClassIndex

        public int getClassIndex()
        Gets the attribute on which misclassifications are based.
        Returns:
        the class index.
      • numFoldsTipText

        public String numFoldsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumFolds

        public void setNumFolds​(int numOfFolds)
        Sets the number of cross-validation folds to use - < 2 means no cross-validation.
        Parameters:
        numOfFolds - the number of folds.
      • getNumFolds

        public int getNumFolds()
        Gets the number of cross-validation folds used by the filter.
        Returns:
        the number of folds.
      • absErrTipText

        public String absErrTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setAbsErr

        public void setAbsErr​(double threshold)
        Sets the threshold for the max error when predicting a numeric class. The value should be >= 0.
        Parameters:
        threshold - the numeric theshold.
      • getAbsErr

        public double getAbsErr()
        Gets the threshold for the max error when predicting a numeric class.
        Returns:
        the numeric threshold.
      • thresholdTipText

        public String thresholdTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setThreshold

        public void setThreshold​(double threshold)
        Sets the threshold for the max error when predicting a numeric class. The value should be >= 0.
        Parameters:
        threshold - the numeric theshold.
      • getThreshold

        public double getThreshold()
        Gets the threshold for the max error when predicting a numeric class.
        Returns:
        the numeric threshold.
      • maxIterationsTipText

        public String maxIterationsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMaxIterations

        public void setMaxIterations​(int iterations)
        Sets the maximum number of cleansing iterations to perform - < 1 means go until fully cleansed
        Parameters:
        iterations - the maximum number of iterations.
      • getMaxIterations

        public int getMaxIterations()
        Gets the maximum number of cleansing iterations performed
        Returns:
        the maximum number of iterations.
      • invertTipText

        public String invertTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setInvert

        public void setInvert​(boolean invert)
        Set whether selection is inverted.
        Parameters:
        invert - whether or not to invert selection.
      • getInvert

        public boolean getInvert()
        Get whether selection is inverted.
        Returns:
        whether or not selection is inverted.
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.filters.Filter
        Returns:
        the revision
      • main

        public static void main​(String[] argv)
        Main method for testing this class.
        Parameters:
        argv - should contain arguments to the filter: use -h for help