weka.classifiers.meta
Class SubsetEnsemble

java.lang.Object
  extended by weka.classifiers.AbstractClassifier
      extended by weka.classifiers.SingleClassifierEnhancer
          extended by weka.classifiers.RandomizableSingleClassifierEnhancer
              extended by weka.classifiers.meta.SubsetEnsemble
All Implemented Interfaces:
Serializable, Cloneable, weka.classifiers.Classifier, weka.core.CapabilitiesHandler, weka.core.OptionHandler, weka.core.Randomizable, weka.core.RevisionHandler

public class SubsetEnsemble
extends weka.classifiers.RandomizableSingleClassifierEnhancer

Generates an ensemble using the following approach:
- for each attribute apart from class attribute do:
* create new dataset with only this feature and the class attribute
* remove all instances that contain a missing value
* if no instances left in subset, don't build a classifier for this feature
* if at least 1 instance is left in subset, build base classifier with it
If no classifier gets built at all, use ZeroR as backup model, built on the full dataset.
In addition to the default feature for a subset, a number of random features can be added to the subset before the classifier is trained.
At prediction time, the Vote meta-classifier (using the pre-built classifiers) is used to determing the class probabilities or regression value.

Valid options are:

 -num-slots <num>
  Number of execution slots.
  (default: 1 - i.e. no parallelism)
 -combination-rule <AVG|PROD|MAJ|MIN|MAX|MED>
  The combination rule to use
  (default: AVG)
 -num-random <num>
  Number of random features to use in addition.
  (default: 0)
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.rules.ZeroR)
 Options specific to classifier weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
Options after -- are passed to the designated classifier.

Version:
$Revision: 4521 $
Author:
fracpete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Field Summary
protected  weka.classifiers.rules.ZeroR m_BackupModel
          The backup classifier, in case no ensemble could be constructed at prediction time.
protected  weka.classifiers.Classifier[] m_Classifiers
          the actual classifiers in use.
protected  int m_CombinationRule
          Combination Rule variable.
protected  int m_Completed
          The number of classifiers completed so far
protected  weka.core.Instances m_Data
          For holding the original training set temporarily.
protected  ThreadPoolExecutor m_ExecutorPool
          Pool of threads to train models with
protected  int m_Failed
          The number of classifiers that experienced a failure of some sort during construction.
protected  weka.core.Instances m_Header
          The header of the training set.
protected  int m_NumExecutionSlots
          The number of threads to have executing at any one time
protected  int m_NumRandomFeatures
          the number of random features to use (in addition to base attribute).
 
Fields inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer
m_Seed
 
Fields inherited from class weka.classifiers.SingleClassifierEnhancer
m_Classifier
 
Fields inherited from class weka.classifiers.AbstractClassifier
m_Debug
 
Constructor Summary
SubsetEnsemble()
           
 
Method Summary
 void buildClassifier(weka.core.Instances data)
          Stump method for building the classifiers
protected  void buildClassifiers()
          Does the actual construction of the ensemble.
 double classifyInstance(weka.core.Instance instance)
          Classifies the given test instance.
 String combinationRuleTipText()
          Returns the tip text for this property.
protected  void completedClassifier(int index, boolean success)
          Records the completion of the training of a single classifier.
protected  weka.classifiers.Classifier constructEnsemble(weka.core.Instance instance)
          Constructs the ensemble.
 double[] distributionForInstance(weka.core.Instance instance)
          Predicts the class memberships for a given instance.
protected  int getActualIndex(int index)
          Returns the actual index in the data of the feature attribute.
 weka.core.SelectedTag getCombinationRule()
          Gets the combination rule used
protected  weka.filters.Filter getFilter(int index, int seed, boolean withMissing)
          Gets a filter for a particular index.
 int getNumExecutionSlots()
          Get the number of execution slots (threads) to use for building the members of the ensemble.
 int getNumRandomFeatures()
          Returns the number of additional random features to use.
 String[] getOptions()
          Gets the current settings of the classifier.
 String getRevision()
          Returns the revision string.
protected  weka.core.Instances getTrainingSet(int index, int seed)
          Gets a training set for a particular index.
 String globalInfo()
          Returns a string describing the classifier.
 Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(String[] args)
          Main method for running this class from commandline.
 String numExecutionSlotsTipText()
          Returns the tip text for this property.
 String numRandomFeaturesTipText()
          Returns the tip text for this property.
 void setCombinationRule(weka.core.SelectedTag value)
          Sets the combination rule to use.
 void setNumExecutionSlots(int value)
          Set the number of execution slots (threads) to use for building the members of the ensemble.
 void setNumRandomFeatures(int value)
          Set the number of additional random features to use.
 void setOptions(String[] options)
          Parses a given list of options.
protected  void startExecutorPool()
          Start the pool of execution threads.
 String toString()
          Returns a string representation of the classifier.
 
Methods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, defaultClassifierString, getCapabilities, getClassifier, getClassifierSpec, setClassifier
 
Methods inherited from class weka.classifiers.AbstractClassifier
debugTipText, forName, getDebug, makeCopies, makeCopy, runClassifier, setDebug
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Classifiers

protected weka.classifiers.Classifier[] m_Classifiers
the actual classifiers in use.


m_NumExecutionSlots

protected int m_NumExecutionSlots
The number of threads to have executing at any one time


m_CombinationRule

protected int m_CombinationRule
Combination Rule variable.


m_NumRandomFeatures

protected int m_NumRandomFeatures
the number of random features to use (in addition to base attribute).


m_ExecutorPool

protected transient ThreadPoolExecutor m_ExecutorPool
Pool of threads to train models with


m_Completed

protected int m_Completed
The number of classifiers completed so far


m_Failed

protected int m_Failed
The number of classifiers that experienced a failure of some sort during construction.


m_Data

protected weka.core.Instances m_Data
For holding the original training set temporarily.


m_Header

protected weka.core.Instances m_Header
The header of the training set.


m_BackupModel

protected weka.classifiers.rules.ZeroR m_BackupModel
The backup classifier, in case no ensemble could be constructed at prediction time.

Constructor Detail

SubsetEnsemble

public SubsetEnsemble()
Method Detail

globalInfo

public String globalInfo()
Returns a string describing the classifier.

Returns:
a description suitable for displaying in the gui

listOptions

public Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface weka.core.OptionHandler
Overrides:
listOptions in class weka.classifiers.RandomizableSingleClassifierEnhancer
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(String[] options)
                throws Exception
Parses a given list of options.

Valid options are:

 -num-slots <num>
  Number of execution slots.
  (default: 1 - i.e. no parallelism)
 -combination-rule <AVG|PROD|MAJ|MIN|MAX|MED>
  The combination rule to use
  (default: AVG)
 -num-random <num>
  Number of random features to use in addition.
  (default: 0)
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.rules.ZeroR)
 Options specific to classifier weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
Options after -- are passed to the designated classifier.

Specified by:
setOptions in interface weka.core.OptionHandler
Overrides:
setOptions in class weka.classifiers.RandomizableSingleClassifierEnhancer
Parameters:
options - the list of options as an array of strings
Throws:
Exception - if an option is not supported

getOptions

public String[] getOptions()
Gets the current settings of the classifier.

Specified by:
getOptions in interface weka.core.OptionHandler
Overrides:
getOptions in class weka.classifiers.RandomizableSingleClassifierEnhancer
Returns:
an array of strings suitable for passing to setOptions

setNumExecutionSlots

public void setNumExecutionSlots(int value)
Set the number of execution slots (threads) to use for building the members of the ensemble.

Parameters:
value - the number of slots to use.

getNumExecutionSlots

public int getNumExecutionSlots()
Get the number of execution slots (threads) to use for building the members of the ensemble.

Returns:
the number of slots to use

numExecutionSlotsTipText

public String numExecutionSlotsTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setCombinationRule

public void setCombinationRule(weka.core.SelectedTag value)
Sets the combination rule to use. Values other than

Parameters:
value - the combination rule method to use

getCombinationRule

public weka.core.SelectedTag getCombinationRule()
Gets the combination rule used

Returns:
the combination rule used

combinationRuleTipText

public String combinationRuleTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumRandomFeatures

public void setNumRandomFeatures(int value)
Set the number of additional random features to use.

Parameters:
value - the number of random features

getNumRandomFeatures

public int getNumRandomFeatures()
Returns the number of additional random features to use.

Returns:
the number of random features

numRandomFeaturesTipText

public String numRandomFeaturesTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

startExecutorPool

protected void startExecutorPool()
Start the pool of execution threads.


buildClassifiers

protected void buildClassifiers()
                         throws Exception
Does the actual construction of the ensemble.

Throws:
Exception - if something goes wrong during the training process

completedClassifier

protected void completedClassifier(int index,
                                   boolean success)
Records the completion of the training of a single classifier. Unblocks if all classifiers have been trained.

Parameters:
index - the index of the classifier that has completed
success - whether the classifier trained successfully

getActualIndex

protected int getActualIndex(int index)
                      throws Exception
Returns the actual index in the data of the feature attribute.

Parameters:
index - the index for the requested attribute
Returns:
the actual attribute index for the supplied index
Throws:
Exception - if something goes wrong

getFilter

protected weka.filters.Filter getFilter(int index,
                                        int seed,
                                        boolean withMissing)
                                 throws Exception
Gets a filter for a particular index.

Parameters:
index - the index for the requested filter
seed - the seed value to use for the determining the additional random features
withMissing - whether to include the RemoveInstancesWithMissingValue filter
Returns:
the filter for the supplied index
Throws:
Exception - if something goes wrong

getTrainingSet

protected weka.core.Instances getTrainingSet(int index,
                                             int seed)
                                      throws Exception
Gets a training set for a particular index.

Parameters:
index - the index for the requested training set
seed - the seed value to use for the determining the additional random features
Returns:
the training set for the supplied index
Throws:
Exception - if something goes wrong

buildClassifier

public void buildClassifier(weka.core.Instances data)
                     throws Exception
Stump method for building the classifiers

Parameters:
data - the training data to be used for generating the ensemble
Throws:
Exception - if the classifier could not be built successfully

constructEnsemble

protected weka.classifiers.Classifier constructEnsemble(weka.core.Instance instance)
Constructs the ensemble.

Parameters:
instance - the instance to base the construction on

classifyInstance

public double classifyInstance(weka.core.Instance instance)
                        throws Exception
Classifies the given test instance. The instance has to belong to a dataset when it's being classified.

Specified by:
classifyInstance in interface weka.classifiers.Classifier
Overrides:
classifyInstance in class weka.classifiers.AbstractClassifier
Parameters:
instance - the instance to be classified
Returns:
the predicted most likely class for the instance or Utils.missingValue() if no prediction is made
Throws:
Exception - if an error occurred during the prediction

distributionForInstance

public double[] distributionForInstance(weka.core.Instance instance)
                                 throws Exception
Predicts the class memberships for a given instance. If an instance is unclassified, the returned array elements must be all zero. If the class is numeric, the array must consist of only one element, which contains the predicted value.

Specified by:
distributionForInstance in interface weka.classifiers.Classifier
Overrides:
distributionForInstance in class weka.classifiers.AbstractClassifier
Parameters:
instance - the instance to be classified
Returns:
an array containing the estimated membership probabilities of the test instance in each class or the numeric prediction
Throws:
Exception - if distribution could not be computed successfully

toString

public String toString()
Returns a string representation of the classifier.

Overrides:
toString in class Object
Returns:
the string representation

getRevision

public String getRevision()
Returns the revision string.

Specified by:
getRevision in interface weka.core.RevisionHandler
Overrides:
getRevision in class weka.classifiers.AbstractClassifier
Returns:
the revision

main

public static void main(String[] args)
Main method for running this class from commandline.

Parameters:
args - the options


Copyright © 2012 University of Waikato, Hamilton, NZ. All Rights Reserved.