Class PredictionErrorIQR

  • All Implemented Interfaces:
    adams.core.CleanUpHandler, adams.core.Destroyable, adams.core.GlobalInfoSupporter, adams.core.logging.LoggingLevelHandler, adams.core.logging.LoggingSupporter, adams.core.option.OptionHandler, adams.core.Randomizable, adams.core.SizeOfHandler, adams.core.Stoppable, adams.core.StoppableWithFeedback, adams.core.ThreadLimiter, PostProcessorDetails<adams.data.spreadsheet.SpreadSheet>, adams.flow.core.FlowContextHandler, Serializable, Comparable

    public class PredictionErrorIQR
    extends AbstractPostProcessor
    implements adams.core.Randomizable, adams.core.StoppableWithFeedback, adams.core.ThreadLimiter, PostProcessorDetails<adams.data.spreadsheet.SpreadSheet>
    Post-processor that removes outliers using a coarse IQR approach on the predictions errors of one or more classifiers.

    parameters:
    - list of classifiers
    - foreach classifier; number or folds, an IQR multiplier, number of iterations or number of consecutive non-removal iterations before stop.

    algorithm:
    foreach classifier
    loop
    do xval, get predictions, remove all examples where error > percentile75+IQR*multiplier of errors
    stop if done num_iterations, or consecutive zero removals


    -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel)
        The logging level for outputting errors and debugging output.
        default: WARNING
        min-user-mode: Expert
     
    -classifier <weka.classifiers.Classifier> [-classifier ...] (property: classifiers)
        The classifiers to cross-validate internally, using their predictions errors
        to determine outliers.
        default: weka.classifiers.rules.ZeroR
     
    -num-folds <adams.core.base.BaseInteger> [-num-folds ...] (property: numFolds)
        The number of cross-validation folds per classifier.
        default: 10
     
    -iqr-multiplier <adams.core.base.BaseDouble> [-iqr-multiplier ...] (property: IQRMultiplier)
        The multiplier for the IQR filter to determine outlier values.
        default: 0.1
     
    -num-iterations <adams.core.base.BaseInteger> [-num-iterations ...] (property: numIterations)
        The number of iterations per classifier.
        default: 0
     
    -max-non-removal-iterations <adams.core.base.BaseInteger> [-max-non-removal-iterations ...] (property: maxNonRemovalIterations)
        The maximum number non-removal iterations per classifier.
        default: 2
     
    -seed <long> (property: seed)
        The seed value for the randomization.
        default: 1
     
    -use-absolute-error <boolean> (property: useAbsoluteError)
        If set to true, then the error will be absolute (no direction).
        default: true
     
    -num-threads <int> (property: numThreads)
        The number of threads to use for parallel execution; > 0: specific number
        of cores to use (capped by actual number of cores available, 1 = sequential
        execution); = 0: number of cores; < 0: number of free cores (eg -2 means
        2 free cores; minimum of one core is used); overrides the value defined
        by the fold generator scheme.
        default: 1
     
    Author:
    fracpete (fracpete at waikato dot ac dot nz), Dale (dale at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected weka.classifiers.Classifier[] m_Classifiers
      the classifiers to use for internal cross-validation.
      protected int m_CountAfter
      the count after.
      protected int m_CountBefore
      the count before.
      protected adams.multiprocess.WekaCrossValidationExecution m_CrossValidation
      the current evaluation.
      protected adams.flow.core.Actor m_FlowContext
      the flow context.
      protected adams.core.base.BaseDouble[] m_IQRMultiplier
      the IQR multiplier per classifier.
      protected adams.core.base.BaseInteger[] m_MaxNonRemovalIterations
      the maximum number of non-removal iterations per classifier.
      protected adams.core.base.BaseInteger[] m_NumFolds
      the number of folds per classifier.
      protected adams.core.base.BaseInteger[] m_NumIterations
      the number of iterations per classifier.
      protected int m_NumThreads
      the number of threads to use for parallel execution.
      protected Random m_Random
      the random number generator in use.
      protected long m_Seed
      the seed value.
      protected boolean m_Stopped
      whether the execution was stopped.
      protected boolean m_UseAbsoluteError
      whether to use absolute errors.
      • Fields inherited from class adams.core.option.AbstractOptionHandler

        m_OptionManager
      • Fields inherited from class adams.core.logging.LoggingObject

        m_Logger, m_LoggingIsEnabled, m_LoggingLevel
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void adjustArrays​(int length)
      Adjusts the arrays to the new length.
      String classifiersTipText()
      Returns the tip text for this property.
      protected weka.core.Instances cleanData​(weka.core.Instances data, int classifierIndex, weka.classifiers.Classifier classifier, int iteration, int numFolds, long seed, double iqrMultiplier)
      Cleans the data using the specified classifier and parameters.
      void cleanUp()
      Cleans up data structures, frees up memory.
      void defineOptions()  
      weka.classifiers.Classifier[] getClassifiers()
      Returns the classifiers to use internally.
      adams.data.spreadsheet.SpreadSheet getDetails()
      Returns details for the cleaner.
      adams.core.base.BaseDouble[] getIQRMultiplier()
      Returns the IQR multipliers per classifier.
      adams.core.base.BaseInteger[] getMaxNonRemovalIterations()
      Returns the maximum number of non-removal iterations per classifier.
      adams.core.base.BaseInteger[] getNumFolds()
      Returns the number of cross-validation folds per classifier.
      adams.core.base.BaseInteger[] getNumIterations()
      Returns the number of iterations per classifier.
      int getNumThreads()
      Returns the number of threads to use for cross-validation.
      long getSeed()
      Returns the seed value.
      boolean getUseAbsoluteError()
      Returns whether to use an absolute error (ie no direction).
      String globalInfo()
      Returns a string describing the object.
      String IQRMultiplierTipText()
      Returns the tip text for this property.
      boolean isStopped()
      Whether the execution has been stopped.
      protected weka.core.Instances iterate​(weka.core.Instances data, int classifierIndex, weka.classifiers.Classifier classifier, int numFolds, long seed, double iqrMultiplier, int numIterations, int maxNonRemovalIterations)
      Cleans the data using the specified classifier and parameters.
      String maxNonRemovalIterationsTipText()
      Returns the tip text for this property.
      String numFoldsTipText()
      Returns the tip text for this property.
      String numIterationsTipText()
      Returns the tip text for this property.
      String numThreadsTipText()
      Returns the tip text for this property.
      protected weka.core.Instance performPostProcess​(weka.core.Instance data)
      Performs the actual postprocessing.
      protected weka.core.Instances performPostProcess​(weka.core.Instances data)
      Performs the actual postprocessing.
      protected void preCheck​(weka.core.Instances data)
      Performs some pre-checks whether the data is actually suitable.
      String seedTipText()
      Returns the tip text for this property.
      void setClassifiers​(weka.classifiers.Classifier[] value)
      Sets the classifiers to use internally.
      void setIQRMultiplier​(adams.core.base.BaseDouble[] value)
      Sets the IQR multipliers per classifier.
      void setMaxNonRemovalIterations​(adams.core.base.BaseInteger[] value)
      Sets the maximum number of non-removal iterations per classifier.
      void setNumFolds​(adams.core.base.BaseInteger[] value)
      Sets the number of cross-validation folds per classifier.
      void setNumIterations​(adams.core.base.BaseInteger[] value)
      Sets the number of iterations per classifier.
      void setNumThreads​(int value)
      Sets the number of threads to use for cross-validation.
      void setSeed​(long value)
      Sets the seed value.
      void setUseAbsoluteError​(boolean value)
      Sets whether to use an absolute error (ie no direction).
      void stopExecution()
      Stops the execution.
      String useAbsoluteErrorTipText()
      Returns the tip text for this property.
      • Methods inherited from class adams.core.option.AbstractOptionHandler

        cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
      • Methods inherited from class adams.core.logging.LoggingObject

        configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
      • Methods inherited from interface adams.core.logging.LoggingLevelHandler

        getLoggingLevel
    • Field Detail

      • m_Classifiers

        protected weka.classifiers.Classifier[] m_Classifiers
        the classifiers to use for internal cross-validation.
      • m_NumFolds

        protected adams.core.base.BaseInteger[] m_NumFolds
        the number of folds per classifier.
      • m_IQRMultiplier

        protected adams.core.base.BaseDouble[] m_IQRMultiplier
        the IQR multiplier per classifier.
      • m_NumIterations

        protected adams.core.base.BaseInteger[] m_NumIterations
        the number of iterations per classifier.
      • m_MaxNonRemovalIterations

        protected adams.core.base.BaseInteger[] m_MaxNonRemovalIterations
        the maximum number of non-removal iterations per classifier.
      • m_Seed

        protected long m_Seed
        the seed value.
      • m_Random

        protected transient Random m_Random
        the random number generator in use.
      • m_UseAbsoluteError

        protected boolean m_UseAbsoluteError
        whether to use absolute errors.
      • m_NumThreads

        protected int m_NumThreads
        the number of threads to use for parallel execution.
      • m_CrossValidation

        protected transient adams.multiprocess.WekaCrossValidationExecution m_CrossValidation
        the current evaluation.
      • m_Stopped

        protected boolean m_Stopped
        whether the execution was stopped.
      • m_FlowContext

        protected transient adams.flow.core.Actor m_FlowContext
        the flow context.
      • m_CountBefore

        protected int m_CountBefore
        the count before.
      • m_CountAfter

        protected int m_CountAfter
        the count after.
    • Constructor Detail

      • PredictionErrorIQR

        public PredictionErrorIQR()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the object.
        Specified by:
        globalInfo in interface adams.core.GlobalInfoSupporter
        Specified by:
        globalInfo in class adams.core.option.AbstractOptionHandler
        Returns:
        a description suitable for displaying in the gui
      • defineOptions

        public void defineOptions()
        Specified by:
        defineOptions in interface adams.core.option.OptionHandler
        Overrides:
        defineOptions in class adams.core.option.AbstractOptionHandler
      • adjustArrays

        protected void adjustArrays​(int length)
        Adjusts the arrays to the new length.
        Parameters:
        length - the new length
      • setClassifiers

        public void setClassifiers​(weka.classifiers.Classifier[] value)
        Sets the classifiers to use internally.
        Parameters:
        value - the classifiers
      • getClassifiers

        public weka.classifiers.Classifier[] getClassifiers()
        Returns the classifiers to use internally.
        Returns:
        the classifiers
      • classifiersTipText

        public String classifiersTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the gui
      • setNumFolds

        public void setNumFolds​(adams.core.base.BaseInteger[] value)
        Sets the number of cross-validation folds per classifier.
        Parameters:
        value - the folds
      • getNumFolds

        public adams.core.base.BaseInteger[] getNumFolds()
        Returns the number of cross-validation folds per classifier.
        Returns:
        the folds
      • numFoldsTipText

        public String numFoldsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the gui
      • setIQRMultiplier

        public void setIQRMultiplier​(adams.core.base.BaseDouble[] value)
        Sets the IQR multipliers per classifier.
        Parameters:
        value - the multipliers
      • getIQRMultiplier

        public adams.core.base.BaseDouble[] getIQRMultiplier()
        Returns the IQR multipliers per classifier.
        Returns:
        the multipliers
      • IQRMultiplierTipText

        public String IQRMultiplierTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the gui
      • setNumIterations

        public void setNumIterations​(adams.core.base.BaseInteger[] value)
        Sets the number of iterations per classifier.
        Parameters:
        value - the iterations
      • getNumIterations

        public adams.core.base.BaseInteger[] getNumIterations()
        Returns the number of iterations per classifier.
        Returns:
        the iterations
      • numIterationsTipText

        public String numIterationsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the gui
      • setMaxNonRemovalIterations

        public void setMaxNonRemovalIterations​(adams.core.base.BaseInteger[] value)
        Sets the maximum number of non-removal iterations per classifier.
        Parameters:
        value - the max
      • getMaxNonRemovalIterations

        public adams.core.base.BaseInteger[] getMaxNonRemovalIterations()
        Returns the maximum number of non-removal iterations per classifier.
        Returns:
        the max
      • maxNonRemovalIterationsTipText

        public String maxNonRemovalIterationsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the gui
      • setSeed

        public void setSeed​(long value)
        Sets the seed value.
        Specified by:
        setSeed in interface adams.core.Randomizable
        Parameters:
        value - the seed
      • getSeed

        public long getSeed()
        Returns the seed value.
        Specified by:
        getSeed in interface adams.core.Randomizable
        Returns:
        the seed
      • seedTipText

        public String seedTipText()
        Returns the tip text for this property.
        Specified by:
        seedTipText in interface adams.core.Randomizable
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setUseAbsoluteError

        public void setUseAbsoluteError​(boolean value)
        Sets whether to use an absolute error (ie no direction).
        Parameters:
        value - true if to use absolute error
      • getUseAbsoluteError

        public boolean getUseAbsoluteError()
        Returns whether to use an absolute error (ie no direction).
        Returns:
        true if to use absolute error
      • useAbsoluteErrorTipText

        public String useAbsoluteErrorTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setNumThreads

        public void setNumThreads​(int value)
        Sets the number of threads to use for cross-validation.
        Specified by:
        setNumThreads in interface adams.core.ThreadLimiter
        Parameters:
        value - the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
      • getNumThreads

        public int getNumThreads()
        Returns the number of threads to use for cross-validation.
        Specified by:
        getNumThreads in interface adams.core.ThreadLimiter
        Returns:
        the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
      • numThreadsTipText

        public String numThreadsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • preCheck

        protected void preCheck​(weka.core.Instances data)
        Performs some pre-checks whether the data is actually suitable.
        Overrides:
        preCheck in class AbstractPostProcessor
        Parameters:
        data - the dataset to check
      • cleanData

        protected weka.core.Instances cleanData​(weka.core.Instances data,
                                                int classifierIndex,
                                                weka.classifiers.Classifier classifier,
                                                int iteration,
                                                int numFolds,
                                                long seed,
                                                double iqrMultiplier)
        Cleans the data using the specified classifier and parameters.
        Parameters:
        data - the data to clean
        classifierIndex - the index of the classifier
        iteration - the current iteration
        classifier - the classifier to cross-validate and obtain predictions from
        numFolds - the cross-validation folds
        seed - the seed for randomizing the data for cross-validation
        iqrMultiplier - the multiplier for the IQR outliers
        Returns:
        the cleaned up data
      • iterate

        protected weka.core.Instances iterate​(weka.core.Instances data,
                                              int classifierIndex,
                                              weka.classifiers.Classifier classifier,
                                              int numFolds,
                                              long seed,
                                              double iqrMultiplier,
                                              int numIterations,
                                              int maxNonRemovalIterations)
        Cleans the data using the specified classifier and parameters.
        Parameters:
        data - the data to clean
        classifierIndex - the index of the classifier
        classifier - the classifier to cross-validate and obtain predictions from
        numFolds - the cross-validation folds
        seed - the seed for randomizing the data for cross-validation
        iqrMultiplier - the multiplier for the IQR outliers
        numIterations - the number of iterations to perform (ignored if 0)
        maxNonRemovalIterations - the maximum number of iterations with no outliers being removed before stopping (ignored if 0)
        Returns:
        the cleaned up data
      • performPostProcess

        protected weka.core.Instances performPostProcess​(weka.core.Instances data)
        Performs the actual postprocessing.
        Specified by:
        performPostProcess in class AbstractPostProcessor
        Parameters:
        data - the dataset to process
        Returns:
        the processed dataset
      • performPostProcess

        protected weka.core.Instance performPostProcess​(weka.core.Instance data)
        Performs the actual postprocessing.
        Specified by:
        performPostProcess in class AbstractPostProcessor
        Parameters:
        data - the instance to process
        Returns:
        the processed instance
      • getDetails

        public adams.data.spreadsheet.SpreadSheet getDetails()
        Returns details for the cleaner.
        Specified by:
        getDetails in interface PostProcessorDetails<adams.data.spreadsheet.SpreadSheet>
        Returns:
        the details
      • stopExecution

        public void stopExecution()
        Stops the execution. No message set.
        Specified by:
        stopExecution in interface adams.core.Stoppable
      • isStopped

        public boolean isStopped()
        Whether the execution has been stopped.
        Specified by:
        isStopped in interface adams.core.StoppableWithFeedback
        Returns:
        true if stopped
      • cleanUp

        public void cleanUp()
        Cleans up data structures, frees up memory.
        Specified by:
        cleanUp in interface adams.core.CleanUpHandler