Package adams.data.cleaner.instance
Class RemoveOutliers
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.data.cleaner.instance.AbstractCleaner
-
- adams.data.cleaner.instance.RemoveOutliers
-
- All Implemented Interfaces:
adams.core.Destroyable,adams.core.GlobalInfoSupporter,adams.core.logging.LoggingLevelHandler,adams.core.logging.LoggingSupporter,adams.core.option.OptionHandler,adams.core.Randomizable,adams.core.ShallowCopySupporter<AbstractCleaner>,adams.core.SizeOfHandler,adams.core.Stoppable,adams.core.StoppableWithFeedback,adams.core.ThreadLimiter,adams.flow.core.FlowContextHandler,Serializable,Comparable
public class RemoveOutliers extends AbstractCleaner implements adams.core.Randomizable, adams.core.ThreadLimiter, adams.core.StoppableWithFeedback
Cross-validates the specified classifier on the incoming data and applies the outlier detector to the actual vs predicted data to remove the outliers.
NB: only works on full dataset, not instance by instance.
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-pre-filter <weka.filters.Filter> (property: preFilter) The filter to use for pre-filtering the data. default: weka.filters.AllFilter
-classifier <weka.classifiers.Classifier> (property: classifier) The classifier to use for generating the actual vs predicted data. default: weka.classifiers.functions.LinearRegressionJ -S 0 -R 1.0E-8
-seed <long> (property: seed) The seed value for the cross-validation. default: 1
-num-folds <int> (property: numFolds) The number of folds to use in the cross-validation. default: 10 minimum: 2
-num-threads <int> (property: numThreads) The number of threads to use for cross-validation; -1 = number of CPUs/cores; 0 or 1 = sequential execution. default: 1 minimum: -1
-detector <adams.flow.control.removeoutliers.AbstractOutlierDetector> (property: detector) The outlier detector to use. default: adams.flow.control.removeoutliers.Null
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected weka.classifiers.Classifierm_Classifierthe classifier to use for evaluation.protected weka.classifiers.StoppableEvaluationm_CurrentEvaluationthe current evaluation.protected adams.flow.control.removeoutliers.AbstractOutlierDetectorm_Detectorthe outlier detector to use.protected adams.multiprocess.JobRunnerm_JobRunnerthe runner in use.protected adams.flow.standalone.JobRunnerSetupm_JobRunnerSetupthe jobrunner setup.protected intm_NumFoldsthe number of folds to use.protected intm_NumThreadsthe number of threads to use for parallel execution.protected longm_Seedthe seed value.protected booleanm_Stoppedwhether the execution was stopped.-
Fields inherited from class adams.data.cleaner.instance.AbstractCleaner
m_ActualPreFilter, m_CleanInstancesError, m_FlowContext, m_PreFilter
-
-
Constructor Summary
Constructors Constructor Description RemoveOutliers()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description StringclassifierTipText()Returns the tip text for this property.protected weka.classifiers.EvaluationcrossValidate(weka.core.Instances data, int folds)Cross-validates the classifier on the given data.voiddefineOptions()Adds options to the internal list of options.StringdetectorTipText()Returns the tip text for this property.protected adams.data.spreadsheet.SpreadSheetevaluationToSpreadSheet(weka.classifiers.Evaluation eval)Turns the predictions of the evaluation object into a spreadsheet.weka.classifiers.ClassifiergetClassifier()Returns the classifier.adams.flow.control.removeoutliers.AbstractOutlierDetectorgetDetector()Returns the detector.intgetNumFolds()Returns the number of folds to use in CV.intgetNumThreads()Returns the number of threads to use for cross-validation.longgetSeed()Returns the seed value.StringglobalInfo()Returns a string describing the object.booleanisStopped()Whether the execution has been stopped.StringnumFoldsTipText()Returns the tip text for this property.StringnumThreadsTipText()Returns the tip text for this property.protected StringperformCheck(weka.core.Instance data)Performs the actual check.protected weka.core.InstancesperformClean(weka.core.Instances data)Performs the actual check.protected voidpreCheck(weka.core.Instances data)Performs the some pre-checks whether the data is actually suitable.StringseedTipText()Returns the tip text for this property.voidsetClassifier(weka.classifiers.Classifier value)Sets the classifier.voidsetDetector(adams.flow.control.removeoutliers.AbstractOutlierDetector value)Sets the detector.voidsetNumFolds(int value)Sets the number of folds to use.voidsetNumThreads(int value)Sets the number of threads to use for cross-validation.voidsetSeed(long value)Sets the seed value.voidstopExecution()Stops the execution.-
Methods inherited from class adams.data.cleaner.instance.AbstractCleaner
check, clean, compareTo, equals, forCommandLine, forName, getCleaners, getCleanInstancesError, getFlowContext, getPreFilter, hasCleanInstancesError, preCheck, preFilter, preFilter, preFilterTipText, reset, setFlowContext, setPreFilter, shallowCopy, shallowCopy
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
-
-
-
Field Detail
-
m_Classifier
protected weka.classifiers.Classifier m_Classifier
the classifier to use for evaluation.
-
m_Seed
protected long m_Seed
the seed value.
-
m_NumFolds
protected int m_NumFolds
the number of folds to use.
-
m_Detector
protected adams.flow.control.removeoutliers.AbstractOutlierDetector m_Detector
the outlier detector to use.
-
m_NumThreads
protected int m_NumThreads
the number of threads to use for parallel execution.
-
m_JobRunnerSetup
protected transient adams.flow.standalone.JobRunnerSetup m_JobRunnerSetup
the jobrunner setup.
-
m_JobRunner
protected transient adams.multiprocess.JobRunner m_JobRunner
the runner in use.
-
m_Stopped
protected boolean m_Stopped
whether the execution was stopped.
-
m_CurrentEvaluation
protected transient weka.classifiers.StoppableEvaluation m_CurrentEvaluation
the current evaluation.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfoin interfaceadams.core.GlobalInfoSupporter- Specified by:
globalInfoin classadams.core.option.AbstractOptionHandler- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptionsin interfaceadams.core.option.OptionHandler- Overrides:
defineOptionsin classAbstractCleaner
-
setClassifier
public void setClassifier(weka.classifiers.Classifier value)
Sets the classifier.- Parameters:
value- the classifier
-
getClassifier
public weka.classifiers.Classifier getClassifier()
Returns the classifier.- Returns:
- the classifier
-
classifierTipText
public String classifierTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setSeed
public void setSeed(long value)
Sets the seed value.- Specified by:
setSeedin interfaceadams.core.Randomizable- Parameters:
value- the seed
-
getSeed
public long getSeed()
Returns the seed value.- Specified by:
getSeedin interfaceadams.core.Randomizable- Returns:
- the seed
-
seedTipText
public String seedTipText()
Returns the tip text for this property.- Specified by:
seedTipTextin interfaceadams.core.Randomizable- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumFolds
public void setNumFolds(int value)
Sets the number of folds to use.- Parameters:
value- the folds
-
getNumFolds
public int getNumFolds()
Returns the number of folds to use in CV.- Returns:
- the folds
-
numFoldsTipText
public String numFoldsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumThreads
public void setNumThreads(int value)
Sets the number of threads to use for cross-validation.- Specified by:
setNumThreadsin interfaceadams.core.ThreadLimiter- Parameters:
value- the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
-
getNumThreads
public int getNumThreads()
Returns the number of threads to use for cross-validation.- Specified by:
getNumThreadsin interfaceadams.core.ThreadLimiter- Returns:
- the number of threads: -1 = # of CPUs/cores; 0/1 = sequential execution
-
numThreadsTipText
public String numThreadsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setDetector
public void setDetector(adams.flow.control.removeoutliers.AbstractOutlierDetector value)
Sets the detector.- Parameters:
value- the detector
-
getDetector
public adams.flow.control.removeoutliers.AbstractOutlierDetector getDetector()
Returns the detector.- Returns:
- the detector
-
detectorTipText
public String detectorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
preCheck
protected void preCheck(weka.core.Instances data)
Performs the some pre-checks whether the data is actually suitable.- Overrides:
preCheckin classAbstractCleaner- Parameters:
data- the instances to clean
-
performCheck
protected String performCheck(weka.core.Instance data)
Performs the actual check.- Specified by:
performCheckin classAbstractCleaner- Parameters:
data- the instance to check- Returns:
- always null
-
crossValidate
protected weka.classifiers.Evaluation crossValidate(weka.core.Instances data, int folds) throws ExceptionCross-validates the classifier on the given data.- Parameters:
data- the data to use for cross-validationfolds- the number of folds- Returns:
- the evaluation
- Throws:
Exception- if cross-validation fails
-
evaluationToSpreadSheet
protected adams.data.spreadsheet.SpreadSheet evaluationToSpreadSheet(weka.classifiers.Evaluation eval)
Turns the predictions of the evaluation object into a spreadsheet.- Parameters:
eval- the evaluation object to convert- Returns:
- the generated spreadsheet
-
performClean
protected weka.core.Instances performClean(weka.core.Instances data)
Performs the actual check.- Specified by:
performCleanin classAbstractCleaner- Parameters:
data- the instance to check- Returns:
- null if ok, otherwise error message
-
stopExecution
public void stopExecution()
Stops the execution. No message set.- Specified by:
stopExecutionin interfaceadams.core.Stoppable
-
isStopped
public boolean isStopped()
Whether the execution has been stopped.- Specified by:
isStoppedin interfaceadams.core.StoppableWithFeedback- Returns:
- true if stopped
-
-