|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectweka.classifiers.AbstractClassifier
weka.classifiers.SingleClassifierEnhancer
weka.classifiers.RandomizableSingleClassifierEnhancer
weka.classifiers.meta.MultiSearch
public class MultiSearch
Performs a search of an arbitrary number of parameters of a classifier and chooses the best pair found for the actual filtering and training.
The default MultiSearch is using the following FilteredClassifier setup:
- classifier: LinearRegression, searching for the "Ridge"
- filter: PLSFilter, searching for the "# of Components"
The properties being explored are totally up to the user, it can be a mix of classifier and filter properties, or only classifier ones or only filter ones.
Since the the MultiSearch classifier itself is used as the base object for the setups being generated, one has to prefix the properties with 'classifier.' (referring to MultiSearch's 'classifier' property).
E.g., if you have a FilteredClassifier selected as base classifier, sporting a PLSFilter and you want to explore the number of PLS components, then your property will be made up of the following components:
- classifier: referring to MultiSearch's classifier property
i.e., the FilteredClassifier.
- filter: referring to the FilteredClassifier's property (= PLSFilter)
- numComponents: the actual property of the PLSFilter that we want to modify
And assembled, the property looks like this:
classifier.filter.numComponents
The initial space is worked on with 2-fold CV to determine the values of the parameters for the selected type of evaluation (e.g., accuracy). The best point in the space is then taken as center and a 10-fold CV is performed with the adjacent parameters. If better parameters are found, then this will act as new center and another 10-fold CV will be performed (kind of hill-climbing). This process is repeated until no better pair is found or the best pair is on the border of the parameter space.
The number of CV-folds for the initial and subsequent spaces can be adjusted, of course.
The outcome of a mathematical function (= double), MultiSearch will convert to integers (values are just cast to int), booleans (0 is false, otherwise true), float, char and long if necessary.
Via a user-supplied 'list' of parameters (blank-separated), one can also set strings and selected tags (drop-down comboboxes in Weka's GenericObjectEditor). Classnames with options (e.g., classifiers with their options) are possible as well.
The best classifier setup can be accessed after the buildClassifier call via the getBestClassifier method.
-E <CC|RMSE|RRSE|MAE|RAE|COMB|ACC|KAP> Determines the parameter used for evaluation: CC = Correlation coefficient RMSE = Root mean squared error RRSE = Root relative squared error MAE = Mean absolute error RAE = Root absolute error COMB = Combined = (1-abs(CC)) + RRSE + RAE ACC = Accuracy KAP = Kappa (default: CC)
-search "<classname options>" A property search setup.
-sample-size <num> The size (in percent) of the sample to search the inital space with. (default: 100)
-log-file <filename> The log file to log the messages to. (default: none)
-initial-folds <num> The number of cross-validation folds for the initial space. Numbers smaller than 2 turn off cross-validation and just perform evaluation on the training set. (default: 2)
-subsequent-folds <num> The number of cross-validation folds for the subsequent sub-spaces. Numbers smaller than 2 turn off cross-validation and just perform evaluation on the training set. (default: 10)
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism)
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.meta.FilteredClassifier)
Options specific to classifier weka.classifiers.meta.FilteredClassifier:
-F <filter specification> Full class name of filter to use, followed by filter options. eg: "weka.filters.unsupervised.attribute.Remove -V -R 1,2"
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.trees.J48)
Options specific to classifier weka.classifiers.functions.LinearRegression:
-D Produce debugging output. (default no debugging output)
-S <number of selection method> Set the attribute selection method to use. 1 = None, 2 = Greedy. (default 0 = M5' method)
-C Do not try to eliminate colinear attributes.
-R <double> Set ridge parameter (default 1.0e-8).General notes:
| Nested Class Summary | |
|---|---|
protected static class |
MultiSearch.EvaluationTask
Helper class for evaluating a setup. |
| Field Summary | |
|---|---|
protected weka.classifiers.Classifier |
m_BestClassifier
the Classifier with the best setup. |
protected PerformanceCache |
m_Cache
the cache for points in the space that got calculated (raw points in space, not evaluated ones!). |
protected int |
m_Completed
The number of setups completed so far. |
protected AbstractParameter[] |
m_DefaultParameters
the default parameters. |
protected int |
m_Evaluation
the type of evaluation. |
protected ThreadPoolExecutor |
m_ExecutorPool
Pool of threads to train models with. |
protected int |
m_Failed
The number of setups that experienced a failure of some sort during construction. |
protected weka.classifiers.meta.FilteredClassifier |
m_FilteredClassifier
the filtered classifier to use, in case a filter is used. |
protected SetupGenerator |
m_Generator
for generating the search parameters. |
protected int |
m_InitialSpaceNumFolds
number of cross-validation folds in the initial space. |
protected File |
m_LogFile
the log file to use. |
protected int |
m_NumExecutionSlots
The number of threads to have executing at any one time. |
protected int |
m_NumSetups
the number of setups to evaluate. |
protected Vector<Performance> |
m_Performances
for storing the performances. |
protected double |
m_SampleSize
the sample size to search the initial space with. |
protected Space |
m_Space
the parameter space. |
protected int |
m_SubsequentSpaceNumFolds
number of cross-validation folds in the subsequent spaces. |
protected boolean |
m_UniformPerformance
whether all performances in the space are the same. |
protected Point<Object> |
m_Values
the best values. |
static weka.core.Tag[] |
TAGS_EVALUATION
evaluation. |
| Fields inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer |
|---|
m_Seed |
| Fields inherited from class weka.classifiers.SingleClassifierEnhancer |
|---|
m_Classifier |
| Fields inherited from class weka.classifiers.AbstractClassifier |
|---|
m_Debug |
| Constructor Summary | |
|---|---|
MultiSearch()
the default constructor. |
|
| Method Summary | |
|---|---|
protected void |
addPerformance(Performance performance,
int folds)
Adds the performance to the cache and the current list of performances. |
protected void |
block(boolean doBlock)
Helper method used for blocking. |
void |
buildClassifier(weka.core.Instances data)
builds the classifier. |
protected void |
completedEvaluation(Object obj,
boolean success)
Records the completion of the training of a single classifier. |
protected String |
defaultClassifierString()
String describing default classifier. |
protected Point<Object> |
determineBestInSpace(Space space,
weka.core.Instances inst,
int folds)
determines the best point for the given space, using CV with specified number of folds. |
double[] |
distributionForInstance(weka.core.Instance instance)
Returns the distribution for the given instance. |
Enumeration |
enumerateMeasures()
Returns an enumeration of the measure names. |
String |
evaluationTipText()
Returns the tip text for this property. |
protected Point<Object> |
findBest(weka.core.Instances inst)
returns the best point in the space. |
weka.classifiers.Classifier |
getBestClassifier()
returns the best Classifier setup. |
weka.core.Capabilities |
getCapabilities()
Returns default capabilities of the classifier. |
protected String |
getCommandline(Object obj)
Returns the commandline of the given object. |
weka.core.SelectedTag |
getEvaluation()
Gets the criterion used for evaluating the classifier performance. |
int |
getInitialSpaceNumFolds()
Gets the number of CV folds for the initial space. |
File |
getLogFile()
Gets current log file. |
double |
getMeasure(String measureName)
Returns the value of the named measure. |
int |
getNumExecutionSlots()
Get the number of execution slots (threads) to use for building the members of the ensemble. |
String[] |
getOptions()
returns the options of the current setup. |
String |
getRevision()
Returns the revision string. |
double |
getSampleSizePercent()
Gets the sample size for the initial space search. |
AbstractParameter[] |
getSearchParameters()
Returns the search parameters. |
int |
getSubsequentSpaceNumFolds()
Gets the number of CV folds for the sub-sequent sub-spaces. |
Point<Object> |
getValues()
returns the parameter values that were found to work best. |
String |
globalInfo()
Returns a string describing classifier. |
String |
initialSpaceNumFoldsTipText()
Returns the tip text for this property. |
Enumeration |
listOptions()
Gets an enumeration describing the available options. |
protected void |
log(String message)
prints the specified message to stdout if debug is on and can also dump the message to a log file. |
protected void |
log(String message,
boolean onlyLog)
prints the specified message to stdout if debug is on and can also dump the message to a log file. |
String |
logFileTipText()
Returns the tip text for this property. |
protected void |
logPerformances(Space space,
Vector<Performance> performances)
aligns all performances in the space and prints those tables to the log file. |
protected String |
logPerformances(Space space,
Vector<Performance> performances,
weka.core.Tag type)
generates a table string for all the performances in the space and returns that. |
static void |
main(String[] args)
Main method for running this classifier from commandline. |
String |
numExecutionSlotsTipText()
Returns the tip text for this property. |
String |
sampleSizePercentTipText()
Returns the tip text for this property. |
String |
searchParametersTipText()
Returns the tip text for this property. |
void |
setClassifier(weka.classifiers.Classifier newClassifier)
Set the base learner. |
void |
setEvaluation(weka.core.SelectedTag value)
Sets the criterion to use for evaluating the classifier performance. |
void |
setInitialSpaceNumFolds(int value)
Sets the number of CV folds for the initial space. |
void |
setLogFile(File value)
Sets the log file to use. |
void |
setNumExecutionSlots(int value)
Set the number of execution slots (threads) to use for building the members of the ensemble. |
void |
setOptions(String[] options)
Parses the options for this object. |
void |
setSampleSizePercent(double value)
Sets the sample size for the initial space search. |
void |
setSearchParameters(AbstractParameter[] value)
Sets the search parameters. |
void |
setSubsequentSpaceNumFolds(int value)
Sets the number of CV folds for the sub-sequent sub-spaces. |
protected void |
startExecutorPool()
Start the pool of execution threads. |
protected void |
stopExecutorPool()
Stops the ppol of execution threads. |
String |
subsequentSpaceNumFoldsTipText()
Returns the tip text for this property. |
String |
toString()
returns a string representation of the classifier. |
String |
toSummaryString()
Returns a string that summarizes the object. |
protected String[] |
updateOption(String[] options,
String option,
String value)
replaces the current option in the options array with a new value. |
| Methods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer |
|---|
getSeed, seedTipText, setSeed |
| Methods inherited from class weka.classifiers.SingleClassifierEnhancer |
|---|
classifierTipText, getClassifier, getClassifierSpec |
| Methods inherited from class weka.classifiers.AbstractClassifier |
|---|
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, runClassifier, setDebug |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final weka.core.Tag[] TAGS_EVALUATION
protected weka.classifiers.Classifier m_BestClassifier
protected Point<Object> m_Values
protected int m_Evaluation
protected SetupGenerator m_Generator
protected double m_SampleSize
protected File m_LogFile
protected Space m_Space
protected PerformanceCache m_Cache
protected boolean m_UniformPerformance
protected weka.classifiers.meta.FilteredClassifier m_FilteredClassifier
protected AbstractParameter[] m_DefaultParameters
protected int m_InitialSpaceNumFolds
protected int m_SubsequentSpaceNumFolds
protected int m_NumExecutionSlots
protected transient ThreadPoolExecutor m_ExecutorPool
protected int m_Completed
protected int m_Failed
protected int m_NumSetups
protected Vector<Performance> m_Performances
| Constructor Detail |
|---|
public MultiSearch()
| Method Detail |
|---|
public String globalInfo()
protected String defaultClassifierString()
defaultClassifierString in class weka.classifiers.SingleClassifierEnhancerpublic Enumeration listOptions()
listOptions in interface weka.core.OptionHandlerlistOptions in class weka.classifiers.RandomizableSingleClassifierEnhancerpublic String[] getOptions()
getOptions in interface weka.core.OptionHandlergetOptions in class weka.classifiers.RandomizableSingleClassifierEnhancer
public void setOptions(String[] options)
throws Exception
-E <CC|RMSE|RRSE|MAE|RAE|COMB|ACC|KAP> Determines the parameter used for evaluation: CC = Correlation coefficient RMSE = Root mean squared error RRSE = Root relative squared error MAE = Mean absolute error RAE = Root absolute error COMB = Combined = (1-abs(CC)) + RRSE + RAE ACC = Accuracy KAP = Kappa (default: CC)
-search "<classname options>" A property search setup.
-sample-size <num> The size (in percent) of the sample to search the inital space with. (default: 100)
-log-file <filename> The log file to log the messages to. (default: none)
-initial-folds <num> The number of cross-validation folds for the initial space. Numbers smaller than 2 turn off cross-validation and just perform evaluation on the training set. (default: 2)
-subsequent-folds <num> The number of cross-validation folds for the subsequent sub-spaces. Numbers smaller than 2 turn off cross-validation and just perform evaluation on the training set. (default: 10)
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism)
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.meta.FilteredClassifier)
Options specific to classifier weka.classifiers.meta.FilteredClassifier:
-F <filter specification> Full class name of filter to use, followed by filter options. eg: "weka.filters.unsupervised.attribute.Remove -V -R 1,2"
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.trees.J48)
Options specific to classifier weka.classifiers.functions.LinearRegression:
-D Produce debugging output. (default no debugging output)
-S <number of selection method> Set the attribute selection method to use. 1 = None, 2 = Greedy. (default 0 = M5' method)
-C Do not try to eliminate colinear attributes.
-R <double> Set ridge parameter (default 1.0e-8).
setOptions in interface weka.core.OptionHandlersetOptions in class weka.classifiers.RandomizableSingleClassifierEnhanceroptions - the options to use
Exception - if setting of options failspublic void setClassifier(weka.classifiers.Classifier newClassifier)
setClassifier in class weka.classifiers.SingleClassifierEnhancernewClassifier - the classifier to use.public String searchParametersTipText()
public void setSearchParameters(AbstractParameter[] value)
value - the parameterspublic AbstractParameter[] getSearchParameters()
public String evaluationTipText()
public void setEvaluation(weka.core.SelectedTag value)
value - .the evaluation criterionpublic weka.core.SelectedTag getEvaluation()
public String sampleSizePercentTipText()
public double getSampleSizePercent()
public void setSampleSizePercent(double value)
value - the sample size for the initial space search.public String logFileTipText()
public File getLogFile()
public void setLogFile(File value)
value - the log file.public String initialSpaceNumFoldsTipText()
public int getInitialSpaceNumFolds()
public void setInitialSpaceNumFolds(int value)
value - the number of folds.public String subsequentSpaceNumFoldsTipText()
public int getSubsequentSpaceNumFolds()
public void setSubsequentSpaceNumFolds(int value)
value - the number of folds.public String numExecutionSlotsTipText()
public void setNumExecutionSlots(int value)
value - the number of slots to use.public int getNumExecutionSlots()
public weka.classifiers.Classifier getBestClassifier()
public Enumeration enumerateMeasures()
enumerateMeasures in interface weka.core.AdditionalMeasureProducerpublic double getMeasure(String measureName)
getMeasure in interface weka.core.AdditionalMeasureProducermeasureName - the name of the measure to query for its value
public Point<Object> getValues()
public weka.core.Capabilities getCapabilities()
getCapabilities in interface weka.classifiers.ClassifiergetCapabilities in interface weka.core.CapabilitiesHandlergetCapabilities in class weka.classifiers.SingleClassifierEnhancerprotected String getCommandline(Object obj)
obj - the object to create the commandline for
protected void log(String message)
message - the message to print or store in a log file
protected void log(String message,
boolean onlyLog)
message - the message to print or store in a log fileonlyLog - if true the message will only be put into the log file
but not to stdout
protected String[] updateOption(String[] options,
String option,
String value)
throws Exception
options - the current optionsoption - the option to set a new value forvalue - the value to set
Exception - if something goes wrong
protected String logPerformances(Space space,
Vector<Performance> performances,
weka.core.Tag type)
space - the current space to align the performances toperformances - the performances to aligntype - the type of performance
protected void logPerformances(Space space,
Vector<Performance> performances)
space - the current space to align the performances toperformances - the performances to alignprotected void startExecutorPool()
protected void stopExecutorPool()
protected void block(boolean doBlock)
doBlock - whether to block or not
protected void completedEvaluation(Object obj,
boolean success)
obj - the classifier or setup values that was attempted to trainsuccess - whether the classifier trained successfully
protected void addPerformance(Performance performance,
int folds)
performance - the performance to addfolds - the number of foldsm_Failed
protected Point<Object> determineBestInSpace(Space space,
weka.core.Instances inst,
int folds)
throws Exception
space - the space to work oninst - the data to work withfolds - the number of folds for cross-validation, if <2 then
evaluation based on the training set is used
Exception - if setup or training fails
protected Point<Object> findBest(weka.core.Instances inst)
throws Exception
inst - the training data
Exception - if something goes wrong
public void buildClassifier(weka.core.Instances data)
throws Exception
buildClassifier in interface weka.classifiers.Classifierdata - the training instances
Exception - if something goes wrong
public double[] distributionForInstance(weka.core.Instance instance)
throws Exception
distributionForInstance in interface weka.classifiers.ClassifierdistributionForInstance in class weka.classifiers.AbstractClassifierinstance - the test instance
Exception - if distribution can't be computed successfullypublic String toString()
toString in class Objectpublic String toSummaryString()
toSummaryString in interface weka.core.Summarizablepublic String getRevision()
getRevision in interface weka.core.RevisionHandlergetRevision in class weka.classifiers.AbstractClassifierpublic static void main(String[] args)
args - the options
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||