Class PredictionErrorIQR
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.data.postprocessor.instances.AbstractPostProcessor
-
- adams.data.postprocessor.instances.PredictionErrorIQR
-
- All Implemented Interfaces:
adams.core.CleanUpHandler,adams.core.Destroyable,adams.core.GlobalInfoSupporter,adams.core.logging.LoggingLevelHandler,adams.core.logging.LoggingSupporter,adams.core.option.OptionHandler,adams.core.Randomizable,adams.core.SizeOfHandler,adams.core.Stoppable,adams.core.StoppableWithFeedback,adams.core.ThreadLimiter,PostProcessorDetails<adams.data.spreadsheet.SpreadSheet>,adams.flow.core.FlowContextHandler,Serializable,Comparable
public class PredictionErrorIQR extends AbstractPostProcessor implements adams.core.Randomizable, adams.core.StoppableWithFeedback, adams.core.ThreadLimiter, PostProcessorDetails<adams.data.spreadsheet.SpreadSheet>
Post-processor that removes outliers using a coarse IQR approach on the predictions errors of one or more classifiers.
parameters:
- list of classifiers
- foreach classifier; number or folds, an IQR multiplier, number of iterations or number of consecutive non-removal iterations before stop.
algorithm:
foreach classifier
loop
do xval, get predictions, remove all examples where error > percentile75+IQR*multiplier of errors
stop if done num_iterations, or consecutive zero removals
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING min-user-mode: Expert
-classifier <weka.classifiers.Classifier> [-classifier ...] (property: classifiers) The classifiers to cross-validate internally, using their predictions errors to determine outliers. default: weka.classifiers.rules.ZeroR
-num-folds <adams.core.base.BaseInteger> [-num-folds ...] (property: numFolds) The number of cross-validation folds per classifier. default: 10
-iqr-multiplier <adams.core.base.BaseDouble> [-iqr-multiplier ...] (property: IQRMultiplier) The multiplier for the IQR filter to determine outlier values. default: 0.1
-num-iterations <adams.core.base.BaseInteger> [-num-iterations ...] (property: numIterations) The number of iterations per classifier. default: 0
-max-non-removal-iterations <adams.core.base.BaseInteger> [-max-non-removal-iterations ...] (property: maxNonRemovalIterations) The maximum number non-removal iterations per classifier. default: 2
-seed <long> (property: seed) The seed value for the randomization. default: 1
-use-absolute-error <boolean> (property: useAbsoluteError) If set to true, then the error will be absolute (no direction). default: true
-num-threads <int> (property: numThreads) The number of threads to use for parallel execution; > 0: specific number of cores to use (capped by actual number of cores available, 1 = sequential execution); = 0: number of cores; < 0: number of free cores (eg -2 means 2 free cores; minimum of one core is used); overrides the value defined by the fold generator scheme. default: 1
- Author:
- fracpete (fracpete at waikato dot ac dot nz), Dale (dale at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected weka.classifiers.Classifier[]m_Classifiersthe classifiers to use for internal cross-validation.protected intm_CountAfterthe count after.protected intm_CountBeforethe count before.protected adams.multiprocess.WekaCrossValidationExecutionm_CrossValidationthe current evaluation.protected adams.flow.core.Actorm_FlowContextthe flow context.protected adams.core.base.BaseDouble[]m_IQRMultiplierthe IQR multiplier per classifier.protected adams.core.base.BaseInteger[]m_MaxNonRemovalIterationsthe maximum number of non-removal iterations per classifier.protected adams.core.base.BaseInteger[]m_NumFoldsthe number of folds per classifier.protected adams.core.base.BaseInteger[]m_NumIterationsthe number of iterations per classifier.protected intm_NumThreadsthe number of threads to use for parallel execution.protected Randomm_Randomthe random number generator in use.protected longm_Seedthe seed value.protected booleanm_Stoppedwhether the execution was stopped.protected booleanm_UseAbsoluteErrorwhether to use absolute errors.
-
Constructor Summary
Constructors Constructor Description PredictionErrorIQR()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidadjustArrays(int length)Adjusts the arrays to the new length.StringclassifiersTipText()Returns the tip text for this property.protected weka.core.InstancescleanData(weka.core.Instances data, int classifierIndex, weka.classifiers.Classifier classifier, int iteration, int numFolds, long seed, double iqrMultiplier)Cleans the data using the specified classifier and parameters.voidcleanUp()Cleans up data structures, frees up memory.voiddefineOptions()weka.classifiers.Classifier[]getClassifiers()Returns the classifiers to use internally.adams.data.spreadsheet.SpreadSheetgetDetails()Returns details for the cleaner.adams.core.base.BaseDouble[]getIQRMultiplier()Returns the IQR multipliers per classifier.adams.core.base.BaseInteger[]getMaxNonRemovalIterations()Returns the maximum number of non-removal iterations per classifier.adams.core.base.BaseInteger[]getNumFolds()Returns the number of cross-validation folds per classifier.adams.core.base.BaseInteger[]getNumIterations()Returns the number of iterations per classifier.intgetNumThreads()Returns the number of threads to use for cross-validation.longgetSeed()Returns the seed value.booleangetUseAbsoluteError()Returns whether to use an absolute error (ie no direction).StringglobalInfo()Returns a string describing the object.StringIQRMultiplierTipText()Returns the tip text for this property.booleanisStopped()Whether the execution has been stopped.protected weka.core.Instancesiterate(weka.core.Instances data, int classifierIndex, weka.classifiers.Classifier classifier, int numFolds, long seed, double iqrMultiplier, int numIterations, int maxNonRemovalIterations)Cleans the data using the specified classifier and parameters.StringmaxNonRemovalIterationsTipText()Returns the tip text for this property.StringnumFoldsTipText()Returns the tip text for this property.StringnumIterationsTipText()Returns the tip text for this property.StringnumThreadsTipText()Returns the tip text for this property.protected weka.core.InstanceperformPostProcess(weka.core.Instance data)Performs the actual postprocessing.protected weka.core.InstancesperformPostProcess(weka.core.Instances data)Performs the actual postprocessing.protected voidpreCheck(weka.core.Instances data)Performs some pre-checks whether the data is actually suitable.StringseedTipText()Returns the tip text for this property.voidsetClassifiers(weka.classifiers.Classifier[] value)Sets the classifiers to use internally.voidsetIQRMultiplier(adams.core.base.BaseDouble[] value)Sets the IQR multipliers per classifier.voidsetMaxNonRemovalIterations(adams.core.base.BaseInteger[] value)Sets the maximum number of non-removal iterations per classifier.voidsetNumFolds(adams.core.base.BaseInteger[] value)Sets the number of cross-validation folds per classifier.voidsetNumIterations(adams.core.base.BaseInteger[] value)Sets the number of iterations per classifier.voidsetNumThreads(int value)Sets the number of threads to use for cross-validation.voidsetSeed(long value)Sets the seed value.voidsetUseAbsoluteError(boolean value)Sets whether to use an absolute error (ie no direction).voidstopExecution()Stops the execution.StringuseAbsoluteErrorTipText()Returns the tip text for this property.-
Methods inherited from class adams.data.postprocessor.instances.AbstractPostProcessor
compareTo, equals, forCommandLine, forName, getEvaluators, getFlowContext, postProcess, postProcess, preCheck, setFlowContext, shallowCopy, shallowCopy
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
-
-
-
Field Detail
-
m_Classifiers
protected weka.classifiers.Classifier[] m_Classifiers
the classifiers to use for internal cross-validation.
-
m_NumFolds
protected adams.core.base.BaseInteger[] m_NumFolds
the number of folds per classifier.
-
m_IQRMultiplier
protected adams.core.base.BaseDouble[] m_IQRMultiplier
the IQR multiplier per classifier.
-
m_NumIterations
protected adams.core.base.BaseInteger[] m_NumIterations
the number of iterations per classifier.
-
m_MaxNonRemovalIterations
protected adams.core.base.BaseInteger[] m_MaxNonRemovalIterations
the maximum number of non-removal iterations per classifier.
-
m_Seed
protected long m_Seed
the seed value.
-
m_Random
protected transient Random m_Random
the random number generator in use.
-
m_UseAbsoluteError
protected boolean m_UseAbsoluteError
whether to use absolute errors.
-
m_NumThreads
protected int m_NumThreads
the number of threads to use for parallel execution.
-
m_CrossValidation
protected transient adams.multiprocess.WekaCrossValidationExecution m_CrossValidation
the current evaluation.
-
m_Stopped
protected boolean m_Stopped
whether the execution was stopped.
-
m_FlowContext
protected transient adams.flow.core.Actor m_FlowContext
the flow context.
-
m_CountBefore
protected int m_CountBefore
the count before.
-
m_CountAfter
protected int m_CountAfter
the count after.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfoin interfaceadams.core.GlobalInfoSupporter- Specified by:
globalInfoin classadams.core.option.AbstractOptionHandler- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
- Specified by:
defineOptionsin interfaceadams.core.option.OptionHandler- Overrides:
defineOptionsin classadams.core.option.AbstractOptionHandler
-
adjustArrays
protected void adjustArrays(int length)
Adjusts the arrays to the new length.- Parameters:
length- the new length
-
setClassifiers
public void setClassifiers(weka.classifiers.Classifier[] value)
Sets the classifiers to use internally.- Parameters:
value- the classifiers
-
getClassifiers
public weka.classifiers.Classifier[] getClassifiers()
Returns the classifiers to use internally.- Returns:
- the classifiers
-
classifiersTipText
public String classifiersTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setNumFolds
public void setNumFolds(adams.core.base.BaseInteger[] value)
Sets the number of cross-validation folds per classifier.- Parameters:
value- the folds
-
getNumFolds
public adams.core.base.BaseInteger[] getNumFolds()
Returns the number of cross-validation folds per classifier.- Returns:
- the folds
-
numFoldsTipText
public String numFoldsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setIQRMultiplier
public void setIQRMultiplier(adams.core.base.BaseDouble[] value)
Sets the IQR multipliers per classifier.- Parameters:
value- the multipliers
-
getIQRMultiplier
public adams.core.base.BaseDouble[] getIQRMultiplier()
Returns the IQR multipliers per classifier.- Returns:
- the multipliers
-
IQRMultiplierTipText
public String IQRMultiplierTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setNumIterations
public void setNumIterations(adams.core.base.BaseInteger[] value)
Sets the number of iterations per classifier.- Parameters:
value- the iterations
-
getNumIterations
public adams.core.base.BaseInteger[] getNumIterations()
Returns the number of iterations per classifier.- Returns:
- the iterations
-
numIterationsTipText
public String numIterationsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setMaxNonRemovalIterations
public void setMaxNonRemovalIterations(adams.core.base.BaseInteger[] value)
Sets the maximum number of non-removal iterations per classifier.- Parameters:
value- the max
-
getMaxNonRemovalIterations
public adams.core.base.BaseInteger[] getMaxNonRemovalIterations()
Returns the maximum number of non-removal iterations per classifier.- Returns:
- the max
-
maxNonRemovalIterationsTipText
public String maxNonRemovalIterationsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setSeed
public void setSeed(long value)
Sets the seed value.- Specified by:
setSeedin interfaceadams.core.Randomizable- Parameters:
value- the seed
-
getSeed
public long getSeed()
Returns the seed value.- Specified by:
getSeedin interfaceadams.core.Randomizable- Returns:
- the seed
-
seedTipText
public String seedTipText()
Returns the tip text for this property.- Specified by:
seedTipTextin interfaceadams.core.Randomizable- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setUseAbsoluteError
public void setUseAbsoluteError(boolean value)
Sets whether to use an absolute error (ie no direction).- Parameters:
value- true if to use absolute error
-
getUseAbsoluteError
public boolean getUseAbsoluteError()
Returns whether to use an absolute error (ie no direction).- Returns:
- true if to use absolute error
-
useAbsoluteErrorTipText
public String useAbsoluteErrorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumThreads
public void setNumThreads(int value)
Sets the number of threads to use for cross-validation.- Specified by:
setNumThreadsin interfaceadams.core.ThreadLimiter- Parameters:
value- the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
-
getNumThreads
public int getNumThreads()
Returns the number of threads to use for cross-validation.- Specified by:
getNumThreadsin interfaceadams.core.ThreadLimiter- Returns:
- the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
-
numThreadsTipText
public String numThreadsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
preCheck
protected void preCheck(weka.core.Instances data)
Performs some pre-checks whether the data is actually suitable.- Overrides:
preCheckin classAbstractPostProcessor- Parameters:
data- the dataset to check
-
cleanData
protected weka.core.Instances cleanData(weka.core.Instances data, int classifierIndex, weka.classifiers.Classifier classifier, int iteration, int numFolds, long seed, double iqrMultiplier)Cleans the data using the specified classifier and parameters.- Parameters:
data- the data to cleanclassifierIndex- the index of the classifieriteration- the current iterationclassifier- the classifier to cross-validate and obtain predictions fromnumFolds- the cross-validation foldsseed- the seed for randomizing the data for cross-validationiqrMultiplier- the multiplier for the IQR outliers- Returns:
- the cleaned up data
-
iterate
protected weka.core.Instances iterate(weka.core.Instances data, int classifierIndex, weka.classifiers.Classifier classifier, int numFolds, long seed, double iqrMultiplier, int numIterations, int maxNonRemovalIterations)Cleans the data using the specified classifier and parameters.- Parameters:
data- the data to cleanclassifierIndex- the index of the classifierclassifier- the classifier to cross-validate and obtain predictions fromnumFolds- the cross-validation foldsseed- the seed for randomizing the data for cross-validationiqrMultiplier- the multiplier for the IQR outliersnumIterations- the number of iterations to perform (ignored if 0)maxNonRemovalIterations- the maximum number of iterations with no outliers being removed before stopping (ignored if 0)- Returns:
- the cleaned up data
-
performPostProcess
protected weka.core.Instances performPostProcess(weka.core.Instances data)
Performs the actual postprocessing.- Specified by:
performPostProcessin classAbstractPostProcessor- Parameters:
data- the dataset to process- Returns:
- the processed dataset
-
performPostProcess
protected weka.core.Instance performPostProcess(weka.core.Instance data)
Performs the actual postprocessing.- Specified by:
performPostProcessin classAbstractPostProcessor- Parameters:
data- the instance to process- Returns:
- the processed instance
-
getDetails
public adams.data.spreadsheet.SpreadSheet getDetails()
Returns details for the cleaner.- Specified by:
getDetailsin interfacePostProcessorDetails<adams.data.spreadsheet.SpreadSheet>- Returns:
- the details
-
stopExecution
public void stopExecution()
Stops the execution. No message set.- Specified by:
stopExecutionin interfaceadams.core.Stoppable
-
isStopped
public boolean isStopped()
Whether the execution has been stopped.- Specified by:
isStoppedin interfaceadams.core.StoppableWithFeedback- Returns:
- true if stopped
-
cleanUp
public void cleanUp()
Cleans up data structures, frees up memory.- Specified by:
cleanUpin interfaceadams.core.CleanUpHandler
-
-