Class RemoveOutliers

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.Randomizable, weka.core.RevisionHandler

    public class RemoveOutliers
    extends weka.filters.SimpleBatchFilter
    implements weka.core.Randomizable
    Cross-validates the specified classifier on the incoming data and applies the outlier detector to the actual vs predicted data to remove the outliers.
    NB: only works on full dataset, not instance by instance.

    Valid options are:

     -classifier <value>
      The classifier to use for generating the actual vs predicted data.
      (default: Linear Regression: No model built yet.)
     -num-folds <value>
      The number of folds to use in the cross-validation.
      (default: 10)
     -num-threads <value>
      The number of threads to use for cross-validation; -1 = number of CPUs/cores; 0 or 1 = sequential execution.
      (default: 1)
     -detector
      The outlier detector to use.
     -output-debug-info
      If set, filter is run in debug mode and
      may output additional info to the console
     -do-not-check-capabilities
      If set, filter capabilities are not checked before filter is built
      (use with caution).
    Version:
    $Revision$
    Author:
    FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static String CLASSIFIER  
      static String DETECTOR  
      protected weka.classifiers.Classifier m_Classifier
      the classifier to use for evaluation.
      protected AbstractOutlierDetector m_Detector
      the outlier detector to use.
      protected int m_NumFolds
      the number of folds to use.
      protected int m_NumThreads
      the number of threads to use for parallel execution.
      protected int m_Seed
      the seed value.
      static String NUM_FOLDS  
      static String NUM_THREADS  
      • Fields inherited from class weka.filters.Filter

        m_Debug, m_DoNotCheckCapabilities, m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
    • Constructor Summary

      Constructors 
      Constructor Description
      RemoveOutliers()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      String classifierTipText()
      Returns the tip text for this property.
      protected weka.classifiers.Evaluation crossValidate​(weka.core.Instances data, int folds)
      Cross-validates the classifier on the given data.
      String detectorTipText()
      Returns the tip text for this property.
      protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
      Determines the output format based on the input format and returns this.
      protected SpreadSheet evaluationToSpreadSheet​(weka.classifiers.Evaluation eval)
      Turns the predictions of the evaluation object into a spreadsheet.
      weka.core.Capabilities getCapabilities()
      Returns the ensemble's capabilities.
      weka.classifiers.Classifier getClassifier()
      Returns the classifier.
      protected weka.classifiers.Classifier getDefaultClassifier()
      Returns the default classifier.
      protected AbstractOutlierDetector getDefaultDetector()
      Returns the default detector.
      protected int getDefaultNumFolds()
      Returns the default number of folds to use in CV.
      protected int getDefaultNumThreads()
      Returns the default number of threads to use for cross-validation.
      protected int getDefaultSeed()
      Returns the default seed value.
      AbstractOutlierDetector getDetector()
      Returns the detector.
      int getNumFolds()
      Returns the number of folds to use in CV.
      int getNumThreads()
      Returns the number of threads to use for cross-validation.
      String[] getOptions()
      Gets the current option settings for the OptionHandler.
      int getSeed()
      Returns the seed value.
      String globalInfo()
      Returns a string describing this filter.
      Enumeration listOptions()
      Returns an enumeration describing the available options.
      String numFoldsTipText()
      Returns the tip text for this property.
      String numThreadsTipText()
      Returns the tip text for this property.
      protected weka.core.Instances process​(weka.core.Instances data)
      Processes the given data (may change the provided dataset) and returns the modified version.
      String seedTipText()
      Returns the tip text for this property.
      void setClassifier​(weka.classifiers.Classifier value)
      Sets the classifier.
      void setDetector​(AbstractOutlierDetector value)
      Sets the detector.
      void setNumFolds​(int value)
      Sets the number of folds to use.
      void setNumThreads​(int value)
      Sets the number of threads to use for cross-validation.
      void setOptions​(String[] options)
      Sets the OptionHandler's options using the given list.
      void setSeed​(int value)
      Sets the seed value.
      • Methods inherited from class weka.filters.SimpleBatchFilter

        allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input, input
      • Methods inherited from class weka.filters.SimpleFilter

        reset, setInputFormat
      • Methods inherited from class weka.filters.Filter

        batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, getRevision, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, main, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
    • Field Detail

      • m_Classifier

        protected weka.classifiers.Classifier m_Classifier
        the classifier to use for evaluation.
      • m_Seed

        protected int m_Seed
        the seed value.
      • m_NumFolds

        protected int m_NumFolds
        the number of folds to use.
      • m_NumThreads

        protected int m_NumThreads
        the number of threads to use for parallel execution.
    • Constructor Detail

      • RemoveOutliers

        public RemoveOutliers()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this filter.
        Specified by:
        globalInfo in class weka.filters.SimpleFilter
        Returns:
        a description of the filter suitable for displaying in the explorer/experimenter gui
      • getDefaultClassifier

        protected weka.classifiers.Classifier getDefaultClassifier()
        Returns the default classifier.
        Returns:
        the default classifier
      • setClassifier

        public void setClassifier​(weka.classifiers.Classifier value)
        Sets the classifier.
        Parameters:
        value - the classifier
      • getClassifier

        public weka.classifiers.Classifier getClassifier()
        Returns the classifier.
        Returns:
        the classifier
      • classifierTipText

        public String classifierTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getDefaultSeed

        protected int getDefaultSeed()
        Returns the default seed value.
        Returns:
        the default seed
      • setSeed

        public void setSeed​(int value)
        Sets the seed value.
        Specified by:
        setSeed in interface weka.core.Randomizable
        Parameters:
        value - the seed
      • getSeed

        public int getSeed()
        Returns the seed value.
        Specified by:
        getSeed in interface weka.core.Randomizable
        Returns:
        the seed
      • seedTipText

        public String seedTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getDefaultNumFolds

        protected int getDefaultNumFolds()
        Returns the default number of folds to use in CV.
        Returns:
        the default folds
      • setNumFolds

        public void setNumFolds​(int value)
        Sets the number of folds to use.
        Parameters:
        value - the folds
      • getNumFolds

        public int getNumFolds()
        Returns the number of folds to use in CV.
        Returns:
        the folds
      • numFoldsTipText

        public String numFoldsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getDefaultNumThreads

        protected int getDefaultNumThreads()
        Returns the default number of threads to use for cross-validation.
        Returns:
        the default number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
      • setNumThreads

        public void setNumThreads​(int value)
        Sets the number of threads to use for cross-validation.
        Parameters:
        value - the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
      • getNumThreads

        public int getNumThreads()
        Returns the number of threads to use for cross-validation.
        Returns:
        the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
      • numThreadsTipText

        public String numThreadsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getDefaultDetector

        protected AbstractOutlierDetector getDefaultDetector()
        Returns the default detector.
        Returns:
        the default detector
      • setDetector

        public void setDetector​(AbstractOutlierDetector value)
        Sets the detector.
        Parameters:
        value - the detector
      • detectorTipText

        public String detectorTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • listOptions

        public Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
      • getOptions

        public String[] getOptions()
        Gets the current option settings for the OptionHandler.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        the list of current option settings as an array of strings
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns the ensemble's capabilities.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
        Returns:
        the capabilities
      • determineOutputFormat

        protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
                                                     throws Exception
        Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().
        Specified by:
        determineOutputFormat in class weka.filters.SimpleFilter
        Parameters:
        inputFormat - the input format to base the output format on
        Returns:
        the output format
        Throws:
        Exception - in case the determination goes wrong
      • crossValidate

        protected weka.classifiers.Evaluation crossValidate​(weka.core.Instances data,
                                                            int folds)
                                                     throws Exception
        Cross-validates the classifier on the given data.
        Parameters:
        data - the data to use for cross-validation
        folds - the number of folds
        Returns:
        the evaluation
        Throws:
        Exception - if cross-validation fails
      • evaluationToSpreadSheet

        protected SpreadSheet evaluationToSpreadSheet​(weka.classifiers.Evaluation eval)
                                               throws Exception
        Turns the predictions of the evaluation object into a spreadsheet.
        Parameters:
        eval - the evaluation object to convert
        Returns:
        the generated spreadsheet
        Throws:
        Exception
      • process

        protected weka.core.Instances process​(weka.core.Instances data)
                                       throws Exception
        Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().
        Specified by:
        process in class weka.filters.SimpleFilter
        Parameters:
        data - the data to process
        Returns:
        the modified data
        Throws:
        Exception - in case the processing goes wrong