Class PredictionErrorIQR
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.data.postprocessor.instances.AbstractPostProcessor
-
- adams.data.postprocessor.instances.PredictionErrorIQR
-
- All Implemented Interfaces:
adams.core.CleanUpHandler
,adams.core.Destroyable
,adams.core.GlobalInfoSupporter
,adams.core.logging.LoggingLevelHandler
,adams.core.logging.LoggingSupporter
,adams.core.option.OptionHandler
,adams.core.Randomizable
,adams.core.SizeOfHandler
,adams.core.Stoppable
,adams.core.StoppableWithFeedback
,adams.core.ThreadLimiter
,PostProcessorDetails<adams.data.spreadsheet.SpreadSheet>
,adams.flow.core.FlowContextHandler
,Serializable
,Comparable
public class PredictionErrorIQR extends AbstractPostProcessor implements adams.core.Randomizable, adams.core.StoppableWithFeedback, adams.core.ThreadLimiter, PostProcessorDetails<adams.data.spreadsheet.SpreadSheet>
Post-processor that removes outliers using a coarse IQR approach on the predictions errors of one or more classifiers.
parameters:
- list of classifiers
- foreach classifier; number or folds, an IQR multiplier, number of iterations or number of consecutive non-removal iterations before stop.
algorithm:
foreach classifier
loop
do xval, get predictions, remove all examples where error > percentile75+IQR*multiplier of errors
stop if done num_iterations, or consecutive zero removals
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING min-user-mode: Expert
-classifier <weka.classifiers.Classifier> [-classifier ...] (property: classifiers) The classifiers to cross-validate internally, using their predictions errors to determine outliers. default: weka.classifiers.rules.ZeroR
-num-folds <adams.core.base.BaseInteger> [-num-folds ...] (property: numFolds) The number of cross-validation folds per classifier. default: 10
-iqr-multiplier <adams.core.base.BaseDouble> [-iqr-multiplier ...] (property: IQRMultiplier) The multiplier for the IQR filter to determine outlier values. default: 0.1
-num-iterations <adams.core.base.BaseInteger> [-num-iterations ...] (property: numIterations) The number of iterations per classifier. default: 0
-max-non-removal-iterations <adams.core.base.BaseInteger> [-max-non-removal-iterations ...] (property: maxNonRemovalIterations) The maximum number non-removal iterations per classifier. default: 2
-seed <long> (property: seed) The seed value for the randomization. default: 1
-use-absolute-error <boolean> (property: useAbsoluteError) If set to true, then the error will be absolute (no direction). default: true
-num-threads <int> (property: numThreads) The number of threads to use for parallel execution; > 0: specific number of cores to use (capped by actual number of cores available, 1 = sequential execution); = 0: number of cores; < 0: number of free cores (eg -2 means 2 free cores; minimum of one core is used); overrides the value defined by the fold generator scheme. default: 1
- Author:
- fracpete (fracpete at waikato dot ac dot nz), Dale (dale at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected weka.classifiers.Classifier[]
m_Classifiers
the classifiers to use for internal cross-validation.protected int
m_CountAfter
the count after.protected int
m_CountBefore
the count before.protected adams.multiprocess.WekaCrossValidationExecution
m_CrossValidation
the current evaluation.protected adams.flow.core.Actor
m_FlowContext
the flow context.protected adams.core.base.BaseDouble[]
m_IQRMultiplier
the IQR multiplier per classifier.protected adams.core.base.BaseInteger[]
m_MaxNonRemovalIterations
the maximum number of non-removal iterations per classifier.protected adams.core.base.BaseInteger[]
m_NumFolds
the number of folds per classifier.protected adams.core.base.BaseInteger[]
m_NumIterations
the number of iterations per classifier.protected int
m_NumThreads
the number of threads to use for parallel execution.protected Random
m_Random
the random number generator in use.protected long
m_Seed
the seed value.protected boolean
m_Stopped
whether the execution was stopped.protected boolean
m_UseAbsoluteError
whether to use absolute errors.
-
Constructor Summary
Constructors Constructor Description PredictionErrorIQR()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
adjustArrays(int length)
Adjusts the arrays to the new length.String
classifiersTipText()
Returns the tip text for this property.protected weka.core.Instances
cleanData(weka.core.Instances data, int classifierIndex, weka.classifiers.Classifier classifier, int iteration, int numFolds, long seed, double iqrMultiplier)
Cleans the data using the specified classifier and parameters.void
cleanUp()
Cleans up data structures, frees up memory.void
defineOptions()
weka.classifiers.Classifier[]
getClassifiers()
Returns the classifiers to use internally.adams.data.spreadsheet.SpreadSheet
getDetails()
Returns details for the cleaner.adams.core.base.BaseDouble[]
getIQRMultiplier()
Returns the IQR multipliers per classifier.adams.core.base.BaseInteger[]
getMaxNonRemovalIterations()
Returns the maximum number of non-removal iterations per classifier.adams.core.base.BaseInteger[]
getNumFolds()
Returns the number of cross-validation folds per classifier.adams.core.base.BaseInteger[]
getNumIterations()
Returns the number of iterations per classifier.int
getNumThreads()
Returns the number of threads to use for cross-validation.long
getSeed()
Returns the seed value.boolean
getUseAbsoluteError()
Returns whether to use an absolute error (ie no direction).String
globalInfo()
Returns a string describing the object.String
IQRMultiplierTipText()
Returns the tip text for this property.boolean
isStopped()
Whether the execution has been stopped.protected weka.core.Instances
iterate(weka.core.Instances data, int classifierIndex, weka.classifiers.Classifier classifier, int numFolds, long seed, double iqrMultiplier, int numIterations, int maxNonRemovalIterations)
Cleans the data using the specified classifier and parameters.String
maxNonRemovalIterationsTipText()
Returns the tip text for this property.String
numFoldsTipText()
Returns the tip text for this property.String
numIterationsTipText()
Returns the tip text for this property.String
numThreadsTipText()
Returns the tip text for this property.protected weka.core.Instance
performPostProcess(weka.core.Instance data)
Performs the actual postprocessing.protected weka.core.Instances
performPostProcess(weka.core.Instances data)
Performs the actual postprocessing.protected void
preCheck(weka.core.Instances data)
Performs some pre-checks whether the data is actually suitable.String
seedTipText()
Returns the tip text for this property.void
setClassifiers(weka.classifiers.Classifier[] value)
Sets the classifiers to use internally.void
setIQRMultiplier(adams.core.base.BaseDouble[] value)
Sets the IQR multipliers per classifier.void
setMaxNonRemovalIterations(adams.core.base.BaseInteger[] value)
Sets the maximum number of non-removal iterations per classifier.void
setNumFolds(adams.core.base.BaseInteger[] value)
Sets the number of cross-validation folds per classifier.void
setNumIterations(adams.core.base.BaseInteger[] value)
Sets the number of iterations per classifier.void
setNumThreads(int value)
Sets the number of threads to use for cross-validation.void
setSeed(long value)
Sets the seed value.void
setUseAbsoluteError(boolean value)
Sets whether to use an absolute error (ie no direction).void
stopExecution()
Stops the execution.String
useAbsoluteErrorTipText()
Returns the tip text for this property.-
Methods inherited from class adams.data.postprocessor.instances.AbstractPostProcessor
compareTo, equals, forCommandLine, forName, getEvaluators, getFlowContext, postProcess, postProcess, preCheck, setFlowContext, shallowCopy, shallowCopy
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
-
-
-
Field Detail
-
m_Classifiers
protected weka.classifiers.Classifier[] m_Classifiers
the classifiers to use for internal cross-validation.
-
m_NumFolds
protected adams.core.base.BaseInteger[] m_NumFolds
the number of folds per classifier.
-
m_IQRMultiplier
protected adams.core.base.BaseDouble[] m_IQRMultiplier
the IQR multiplier per classifier.
-
m_NumIterations
protected adams.core.base.BaseInteger[] m_NumIterations
the number of iterations per classifier.
-
m_MaxNonRemovalIterations
protected adams.core.base.BaseInteger[] m_MaxNonRemovalIterations
the maximum number of non-removal iterations per classifier.
-
m_Seed
protected long m_Seed
the seed value.
-
m_Random
protected transient Random m_Random
the random number generator in use.
-
m_UseAbsoluteError
protected boolean m_UseAbsoluteError
whether to use absolute errors.
-
m_NumThreads
protected int m_NumThreads
the number of threads to use for parallel execution.
-
m_CrossValidation
protected transient adams.multiprocess.WekaCrossValidationExecution m_CrossValidation
the current evaluation.
-
m_Stopped
protected boolean m_Stopped
whether the execution was stopped.
-
m_FlowContext
protected transient adams.flow.core.Actor m_FlowContext
the flow context.
-
m_CountBefore
protected int m_CountBefore
the count before.
-
m_CountAfter
protected int m_CountAfter
the count after.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceadams.core.GlobalInfoSupporter
- Specified by:
globalInfo
in classadams.core.option.AbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
- Specified by:
defineOptions
in interfaceadams.core.option.OptionHandler
- Overrides:
defineOptions
in classadams.core.option.AbstractOptionHandler
-
adjustArrays
protected void adjustArrays(int length)
Adjusts the arrays to the new length.- Parameters:
length
- the new length
-
setClassifiers
public void setClassifiers(weka.classifiers.Classifier[] value)
Sets the classifiers to use internally.- Parameters:
value
- the classifiers
-
getClassifiers
public weka.classifiers.Classifier[] getClassifiers()
Returns the classifiers to use internally.- Returns:
- the classifiers
-
classifiersTipText
public String classifiersTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setNumFolds
public void setNumFolds(adams.core.base.BaseInteger[] value)
Sets the number of cross-validation folds per classifier.- Parameters:
value
- the folds
-
getNumFolds
public adams.core.base.BaseInteger[] getNumFolds()
Returns the number of cross-validation folds per classifier.- Returns:
- the folds
-
numFoldsTipText
public String numFoldsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setIQRMultiplier
public void setIQRMultiplier(adams.core.base.BaseDouble[] value)
Sets the IQR multipliers per classifier.- Parameters:
value
- the multipliers
-
getIQRMultiplier
public adams.core.base.BaseDouble[] getIQRMultiplier()
Returns the IQR multipliers per classifier.- Returns:
- the multipliers
-
IQRMultiplierTipText
public String IQRMultiplierTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setNumIterations
public void setNumIterations(adams.core.base.BaseInteger[] value)
Sets the number of iterations per classifier.- Parameters:
value
- the iterations
-
getNumIterations
public adams.core.base.BaseInteger[] getNumIterations()
Returns the number of iterations per classifier.- Returns:
- the iterations
-
numIterationsTipText
public String numIterationsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setMaxNonRemovalIterations
public void setMaxNonRemovalIterations(adams.core.base.BaseInteger[] value)
Sets the maximum number of non-removal iterations per classifier.- Parameters:
value
- the max
-
getMaxNonRemovalIterations
public adams.core.base.BaseInteger[] getMaxNonRemovalIterations()
Returns the maximum number of non-removal iterations per classifier.- Returns:
- the max
-
maxNonRemovalIterationsTipText
public String maxNonRemovalIterationsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setSeed
public void setSeed(long value)
Sets the seed value.- Specified by:
setSeed
in interfaceadams.core.Randomizable
- Parameters:
value
- the seed
-
getSeed
public long getSeed()
Returns the seed value.- Specified by:
getSeed
in interfaceadams.core.Randomizable
- Returns:
- the seed
-
seedTipText
public String seedTipText()
Returns the tip text for this property.- Specified by:
seedTipText
in interfaceadams.core.Randomizable
- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setUseAbsoluteError
public void setUseAbsoluteError(boolean value)
Sets whether to use an absolute error (ie no direction).- Parameters:
value
- true if to use absolute error
-
getUseAbsoluteError
public boolean getUseAbsoluteError()
Returns whether to use an absolute error (ie no direction).- Returns:
- true if to use absolute error
-
useAbsoluteErrorTipText
public String useAbsoluteErrorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumThreads
public void setNumThreads(int value)
Sets the number of threads to use for cross-validation.- Specified by:
setNumThreads
in interfaceadams.core.ThreadLimiter
- Parameters:
value
- the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
-
getNumThreads
public int getNumThreads()
Returns the number of threads to use for cross-validation.- Specified by:
getNumThreads
in interfaceadams.core.ThreadLimiter
- Returns:
- the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
-
numThreadsTipText
public String numThreadsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
preCheck
protected void preCheck(weka.core.Instances data)
Performs some pre-checks whether the data is actually suitable.- Overrides:
preCheck
in classAbstractPostProcessor
- Parameters:
data
- the dataset to check
-
cleanData
protected weka.core.Instances cleanData(weka.core.Instances data, int classifierIndex, weka.classifiers.Classifier classifier, int iteration, int numFolds, long seed, double iqrMultiplier)
Cleans the data using the specified classifier and parameters.- Parameters:
data
- the data to cleanclassifierIndex
- the index of the classifieriteration
- the current iterationclassifier
- the classifier to cross-validate and obtain predictions fromnumFolds
- the cross-validation foldsseed
- the seed for randomizing the data for cross-validationiqrMultiplier
- the multiplier for the IQR outliers- Returns:
- the cleaned up data
-
iterate
protected weka.core.Instances iterate(weka.core.Instances data, int classifierIndex, weka.classifiers.Classifier classifier, int numFolds, long seed, double iqrMultiplier, int numIterations, int maxNonRemovalIterations)
Cleans the data using the specified classifier and parameters.- Parameters:
data
- the data to cleanclassifierIndex
- the index of the classifierclassifier
- the classifier to cross-validate and obtain predictions fromnumFolds
- the cross-validation foldsseed
- the seed for randomizing the data for cross-validationiqrMultiplier
- the multiplier for the IQR outliersnumIterations
- the number of iterations to perform (ignored if 0)maxNonRemovalIterations
- the maximum number of iterations with no outliers being removed before stopping (ignored if 0)- Returns:
- the cleaned up data
-
performPostProcess
protected weka.core.Instances performPostProcess(weka.core.Instances data)
Performs the actual postprocessing.- Specified by:
performPostProcess
in classAbstractPostProcessor
- Parameters:
data
- the dataset to process- Returns:
- the processed dataset
-
performPostProcess
protected weka.core.Instance performPostProcess(weka.core.Instance data)
Performs the actual postprocessing.- Specified by:
performPostProcess
in classAbstractPostProcessor
- Parameters:
data
- the instance to process- Returns:
- the processed instance
-
getDetails
public adams.data.spreadsheet.SpreadSheet getDetails()
Returns details for the cleaner.- Specified by:
getDetails
in interfacePostProcessorDetails<adams.data.spreadsheet.SpreadSheet>
- Returns:
- the details
-
stopExecution
public void stopExecution()
Stops the execution. No message set.- Specified by:
stopExecution
in interfaceadams.core.Stoppable
-
isStopped
public boolean isStopped()
Whether the execution has been stopped.- Specified by:
isStopped
in interfaceadams.core.StoppableWithFeedback
- Returns:
- true if stopped
-
cleanUp
public void cleanUp()
Cleans up data structures, frees up memory.- Specified by:
cleanUp
in interfaceadams.core.CleanUpHandler
-
-