Package adams.data.cleaner.instance
Class RemoveOutliers
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.data.cleaner.instance.AbstractCleaner
-
- adams.data.cleaner.instance.RemoveOutliers
-
- All Implemented Interfaces:
adams.core.Destroyable
,adams.core.GlobalInfoSupporter
,adams.core.logging.LoggingLevelHandler
,adams.core.logging.LoggingSupporter
,adams.core.option.OptionHandler
,adams.core.Randomizable
,adams.core.ShallowCopySupporter<AbstractCleaner>
,adams.core.SizeOfHandler
,adams.core.Stoppable
,adams.core.StoppableWithFeedback
,adams.core.ThreadLimiter
,adams.flow.core.FlowContextHandler
,Serializable
,Comparable
public class RemoveOutliers extends AbstractCleaner implements adams.core.Randomizable, adams.core.ThreadLimiter, adams.core.StoppableWithFeedback
Cross-validates the specified classifier on the incoming data and applies the outlier detector to the actual vs predicted data to remove the outliers.
NB: only works on full dataset, not instance by instance.
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-pre-filter <weka.filters.Filter> (property: preFilter) The filter to use for pre-filtering the data. default: weka.filters.AllFilter
-classifier <weka.classifiers.Classifier> (property: classifier) The classifier to use for generating the actual vs predicted data. default: weka.classifiers.functions.LinearRegressionJ -S 0 -R 1.0E-8
-seed <long> (property: seed) The seed value for the cross-validation. default: 1
-num-folds <int> (property: numFolds) The number of folds to use in the cross-validation. default: 10 minimum: 2
-num-threads <int> (property: numThreads) The number of threads to use for cross-validation; -1 = number of CPUs/cores; 0 or 1 = sequential execution. default: 1 minimum: -1
-detector <adams.flow.control.removeoutliers.AbstractOutlierDetector> (property: detector) The outlier detector to use. default: adams.flow.control.removeoutliers.Null
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected weka.classifiers.Classifier
m_Classifier
the classifier to use for evaluation.protected weka.classifiers.StoppableEvaluation
m_CurrentEvaluation
the current evaluation.protected adams.flow.control.removeoutliers.AbstractOutlierDetector
m_Detector
the outlier detector to use.protected adams.multiprocess.JobRunner
m_JobRunner
the runner in use.protected adams.flow.standalone.JobRunnerSetup
m_JobRunnerSetup
the jobrunner setup.protected int
m_NumFolds
the number of folds to use.protected int
m_NumThreads
the number of threads to use for parallel execution.protected long
m_Seed
the seed value.protected boolean
m_Stopped
whether the execution was stopped.-
Fields inherited from class adams.data.cleaner.instance.AbstractCleaner
m_ActualPreFilter, m_CleanInstancesError, m_FlowContext, m_PreFilter
-
-
Constructor Summary
Constructors Constructor Description RemoveOutliers()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
classifierTipText()
Returns the tip text for this property.protected weka.classifiers.Evaluation
crossValidate(weka.core.Instances data, int folds)
Cross-validates the classifier on the given data.void
defineOptions()
Adds options to the internal list of options.String
detectorTipText()
Returns the tip text for this property.protected adams.data.spreadsheet.SpreadSheet
evaluationToSpreadSheet(weka.classifiers.Evaluation eval)
Turns the predictions of the evaluation object into a spreadsheet.weka.classifiers.Classifier
getClassifier()
Returns the classifier.adams.flow.control.removeoutliers.AbstractOutlierDetector
getDetector()
Returns the detector.int
getNumFolds()
Returns the number of folds to use in CV.int
getNumThreads()
Returns the number of threads to use for cross-validation.long
getSeed()
Returns the seed value.String
globalInfo()
Returns a string describing the object.boolean
isStopped()
Whether the execution has been stopped.String
numFoldsTipText()
Returns the tip text for this property.String
numThreadsTipText()
Returns the tip text for this property.protected String
performCheck(weka.core.Instance data)
Performs the actual check.protected weka.core.Instances
performClean(weka.core.Instances data)
Performs the actual check.protected void
preCheck(weka.core.Instances data)
Performs the some pre-checks whether the data is actually suitable.String
seedTipText()
Returns the tip text for this property.void
setClassifier(weka.classifiers.Classifier value)
Sets the classifier.void
setDetector(adams.flow.control.removeoutliers.AbstractOutlierDetector value)
Sets the detector.void
setNumFolds(int value)
Sets the number of folds to use.void
setNumThreads(int value)
Sets the number of threads to use for cross-validation.void
setSeed(long value)
Sets the seed value.void
stopExecution()
Stops the execution.-
Methods inherited from class adams.data.cleaner.instance.AbstractCleaner
check, clean, compareTo, equals, forCommandLine, forName, getCleaners, getCleanInstancesError, getFlowContext, getPreFilter, hasCleanInstancesError, preCheck, preFilter, preFilter, preFilterTipText, reset, setFlowContext, setPreFilter, shallowCopy, shallowCopy
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
-
-
-
Field Detail
-
m_Classifier
protected weka.classifiers.Classifier m_Classifier
the classifier to use for evaluation.
-
m_Seed
protected long m_Seed
the seed value.
-
m_NumFolds
protected int m_NumFolds
the number of folds to use.
-
m_Detector
protected adams.flow.control.removeoutliers.AbstractOutlierDetector m_Detector
the outlier detector to use.
-
m_NumThreads
protected int m_NumThreads
the number of threads to use for parallel execution.
-
m_JobRunnerSetup
protected transient adams.flow.standalone.JobRunnerSetup m_JobRunnerSetup
the jobrunner setup.
-
m_JobRunner
protected transient adams.multiprocess.JobRunner m_JobRunner
the runner in use.
-
m_Stopped
protected boolean m_Stopped
whether the execution was stopped.
-
m_CurrentEvaluation
protected transient weka.classifiers.StoppableEvaluation m_CurrentEvaluation
the current evaluation.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceadams.core.GlobalInfoSupporter
- Specified by:
globalInfo
in classadams.core.option.AbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceadams.core.option.OptionHandler
- Overrides:
defineOptions
in classAbstractCleaner
-
setClassifier
public void setClassifier(weka.classifiers.Classifier value)
Sets the classifier.- Parameters:
value
- the classifier
-
getClassifier
public weka.classifiers.Classifier getClassifier()
Returns the classifier.- Returns:
- the classifier
-
classifierTipText
public String classifierTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setSeed
public void setSeed(long value)
Sets the seed value.- Specified by:
setSeed
in interfaceadams.core.Randomizable
- Parameters:
value
- the seed
-
getSeed
public long getSeed()
Returns the seed value.- Specified by:
getSeed
in interfaceadams.core.Randomizable
- Returns:
- the seed
-
seedTipText
public String seedTipText()
Returns the tip text for this property.- Specified by:
seedTipText
in interfaceadams.core.Randomizable
- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumFolds
public void setNumFolds(int value)
Sets the number of folds to use.- Parameters:
value
- the folds
-
getNumFolds
public int getNumFolds()
Returns the number of folds to use in CV.- Returns:
- the folds
-
numFoldsTipText
public String numFoldsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumThreads
public void setNumThreads(int value)
Sets the number of threads to use for cross-validation.- Specified by:
setNumThreads
in interfaceadams.core.ThreadLimiter
- Parameters:
value
- the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
-
getNumThreads
public int getNumThreads()
Returns the number of threads to use for cross-validation.- Specified by:
getNumThreads
in interfaceadams.core.ThreadLimiter
- Returns:
- the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
-
numThreadsTipText
public String numThreadsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setDetector
public void setDetector(adams.flow.control.removeoutliers.AbstractOutlierDetector value)
Sets the detector.- Parameters:
value
- the detector
-
getDetector
public adams.flow.control.removeoutliers.AbstractOutlierDetector getDetector()
Returns the detector.- Returns:
- the detector
-
detectorTipText
public String detectorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
preCheck
protected void preCheck(weka.core.Instances data)
Performs the some pre-checks whether the data is actually suitable.- Overrides:
preCheck
in classAbstractCleaner
- Parameters:
data
- the instances to clean
-
performCheck
protected String performCheck(weka.core.Instance data)
Performs the actual check.- Specified by:
performCheck
in classAbstractCleaner
- Parameters:
data
- the instance to check- Returns:
- always null
-
crossValidate
protected weka.classifiers.Evaluation crossValidate(weka.core.Instances data, int folds) throws Exception
Cross-validates the classifier on the given data.- Parameters:
data
- the data to use for cross-validationfolds
- the number of folds- Returns:
- the evaluation
- Throws:
Exception
- if cross-validation fails
-
evaluationToSpreadSheet
protected adams.data.spreadsheet.SpreadSheet evaluationToSpreadSheet(weka.classifiers.Evaluation eval)
Turns the predictions of the evaluation object into a spreadsheet.- Parameters:
eval
- the evaluation object to convert- Returns:
- the generated spreadsheet
-
performClean
protected weka.core.Instances performClean(weka.core.Instances data)
Performs the actual check.- Specified by:
performClean
in classAbstractCleaner
- Parameters:
data
- the instance to check- Returns:
- null if ok, otherwise error message
-
stopExecution
public void stopExecution()
Stops the execution. No message set.- Specified by:
stopExecution
in interfaceadams.core.Stoppable
-
isStopped
public boolean isStopped()
Whether the execution has been stopped.- Specified by:
isStopped
in interfaceadams.core.StoppableWithFeedback
- Returns:
- true if stopped
-
-