Class SubsetEnsemble

  • All Implemented Interfaces:
    Serializable, Cloneable, weka.classifiers.Classifier, weka.core.BatchPredictor, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.Randomizable, weka.core.RevisionHandler

    public class SubsetEnsemble
    extends weka.classifiers.RandomizableSingleClassifierEnhancer
    Generates an ensemble using the following approach:
    - for each attribute apart from class attribute do:
    * create new dataset with only this feature and the class attribute
    * remove all instances that contain a missing value
    * if no instances left in subset, don't build a classifier for this feature
    * if at least 1 instance is left in subset, build base classifier with it
    If no classifier gets built at all, use ZeroR as backup model, built on the full dataset.
    In addition to the default feature for a subset, a number of random features can be added to the subset before the classifier is trained.
    At prediction time, the Vote meta-classifier (using the pre-built classifiers) is used to determing the class probabilities or regression value.

    Valid options are:

     -num-slots <num>
      Number of execution slots.
      (default: 1 - i.e. no parallelism)
     -combination-rule <AVG|PROD|MAJ|MIN|MAX|MED>
      The combination rule to use
      (default: AVG)
     -num-random <num>
      Number of random features to use in addition.
      (default: 0)
     -S <num>
      Random number seed.
      (default 1)
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     -W
      Full name of base classifier.
      (default: weka.classifiers.rules.ZeroR)
     Options specific to classifier weka.classifiers.rules.ZeroR:
     
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
    Options after -- are passed to the designated classifier.
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected weka.classifiers.rules.ZeroR m_BackupModel
      The backup classifier, in case no ensemble could be constructed at prediction time.
      protected weka.classifiers.Classifier[] m_Classifiers
      the actual classifiers in use.
      protected int m_CombinationRule
      Combination Rule variable.
      protected int m_Completed
      The number of classifiers completed so far
      protected weka.core.Instances m_Data
      For holding the original training set temporarily.
      protected ThreadPoolExecutor m_ExecutorPool
      Pool of threads to train models with
      protected int m_Failed
      The number of classifiers that experienced a failure of some sort during construction.
      protected weka.core.Instances m_Header
      The header of the training set.
      protected int m_NumExecutionSlots
      The number of threads to have executing at any one time
      protected int m_NumRandomFeatures
      the number of random features to use (in addition to base attribute).
      • Fields inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer

        m_Seed
      • Fields inherited from class weka.classifiers.SingleClassifierEnhancer

        m_Classifier
      • Fields inherited from class weka.classifiers.AbstractClassifier

        BATCH_SIZE_DEFAULT, m_BatchSize, m_Debug, m_DoNotCheckCapabilities, m_numDecimalPlaces, NUM_DECIMAL_PLACES_DEFAULT
    • Constructor Summary

      Constructors 
      Constructor Description
      SubsetEnsemble()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void buildClassifier​(weka.core.Instances data)
      Stump method for building the classifiers
      protected void buildClassifiers()
      Does the actual construction of the ensemble.
      double classifyInstance​(weka.core.Instance instance)
      Classifies the given test instance.
      String combinationRuleTipText()
      Returns the tip text for this property.
      protected void completedClassifier​(int index, boolean success)
      Records the completion of the training of a single classifier.
      protected weka.classifiers.Classifier constructEnsemble​(weka.core.Instance instance)
      Constructs the ensemble.
      double[] distributionForInstance​(weka.core.Instance instance)
      Predicts the class memberships for a given instance.
      protected int getActualIndex​(int index)
      Returns the actual index in the data of the feature attribute.
      weka.core.SelectedTag getCombinationRule()
      Gets the combination rule used
      protected weka.filters.Filter getFilter​(int index, int seed, boolean withMissing)
      Gets a filter for a particular index.
      int getNumExecutionSlots()
      Get the number of execution slots (threads) to use for building the members of the ensemble.
      int getNumRandomFeatures()
      Returns the number of additional random features to use.
      String[] getOptions()
      Gets the current settings of the classifier.
      String getRevision()
      Returns the revision string.
      protected weka.core.Instances getTrainingSet​(int index, int seed)
      Gets a training set for a particular index.
      String globalInfo()
      Returns a string describing the classifier.
      Enumeration listOptions()
      Returns an enumeration describing the available options.
      static void main​(String[] args)
      Main method for running this class from commandline.
      String numExecutionSlotsTipText()
      Returns the tip text for this property.
      String numRandomFeaturesTipText()
      Returns the tip text for this property.
      void setCombinationRule​(weka.core.SelectedTag value)
      Sets the combination rule to use.
      void setNumExecutionSlots​(int value)
      Set the number of execution slots (threads) to use for building the members of the ensemble.
      void setNumRandomFeatures​(int value)
      Set the number of additional random features to use.
      void setOptions​(String[] options)
      Parses a given list of options.
      protected void startExecutorPool()
      Start the pool of execution threads.
      String toString()
      Returns a string representation of the classifier.
      • Methods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer

        getSeed, seedTipText, setSeed
      • Methods inherited from class weka.classifiers.SingleClassifierEnhancer

        classifierTipText, defaultClassifierOptions, defaultClassifierString, getCapabilities, getClassifier, getClassifierSpec, postExecution, preExecution, setClassifier
      • Methods inherited from class weka.classifiers.AbstractClassifier

        batchSizeTipText, debugTipText, distributionsForInstances, doNotCheckCapabilitiesTipText, forName, getBatchSize, getDebug, getDoNotCheckCapabilities, getNumDecimalPlaces, implementsMoreEfficientBatchPrediction, makeCopies, makeCopy, numDecimalPlacesTipText, run, runClassifier, setBatchSize, setDebug, setDoNotCheckCapabilities, setNumDecimalPlaces
    • Field Detail

      • m_Classifiers

        protected weka.classifiers.Classifier[] m_Classifiers
        the actual classifiers in use.
      • m_NumExecutionSlots

        protected int m_NumExecutionSlots
        The number of threads to have executing at any one time
      • m_CombinationRule

        protected int m_CombinationRule
        Combination Rule variable.
      • m_NumRandomFeatures

        protected int m_NumRandomFeatures
        the number of random features to use (in addition to base attribute).
      • m_ExecutorPool

        protected transient ThreadPoolExecutor m_ExecutorPool
        Pool of threads to train models with
      • m_Completed

        protected int m_Completed
        The number of classifiers completed so far
      • m_Failed

        protected int m_Failed
        The number of classifiers that experienced a failure of some sort during construction.
      • m_Data

        protected weka.core.Instances m_Data
        For holding the original training set temporarily.
      • m_Header

        protected weka.core.Instances m_Header
        The header of the training set.
      • m_BackupModel

        protected weka.classifiers.rules.ZeroR m_BackupModel
        The backup classifier, in case no ensemble could be constructed at prediction time.
    • Constructor Detail

      • SubsetEnsemble

        public SubsetEnsemble()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the classifier.
        Returns:
        a description suitable for displaying in the gui
      • listOptions

        public Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.classifiers.RandomizableSingleClassifierEnhancer
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses a given list of options.

        Valid options are:

         -num-slots <num>
          Number of execution slots.
          (default: 1 - i.e. no parallelism)
         -combination-rule <AVG|PROD|MAJ|MIN|MAX|MED>
          The combination rule to use
          (default: AVG)
         -num-random <num>
          Number of random features to use in addition.
          (default: 0)
         -S <num>
          Random number seed.
          (default 1)
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
         -W
          Full name of base classifier.
          (default: weka.classifiers.rules.ZeroR)
         Options specific to classifier weka.classifiers.rules.ZeroR:
         
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
        Options after -- are passed to the designated classifier.

        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.classifiers.RandomizableSingleClassifierEnhancer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
      • getOptions

        public String[] getOptions()
        Gets the current settings of the classifier.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.classifiers.RandomizableSingleClassifierEnhancer
        Returns:
        an array of strings suitable for passing to setOptions
      • setNumExecutionSlots

        public void setNumExecutionSlots​(int value)
        Set the number of execution slots (threads) to use for building the members of the ensemble.
        Parameters:
        value - the number of slots to use.
      • getNumExecutionSlots

        public int getNumExecutionSlots()
        Get the number of execution slots (threads) to use for building the members of the ensemble.
        Returns:
        the number of slots to use
      • numExecutionSlotsTipText

        public String numExecutionSlotsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setCombinationRule

        public void setCombinationRule​(weka.core.SelectedTag value)
        Sets the combination rule to use. Values other than
        Parameters:
        value - the combination rule method to use
      • getCombinationRule

        public weka.core.SelectedTag getCombinationRule()
        Gets the combination rule used
        Returns:
        the combination rule used
      • combinationRuleTipText

        public String combinationRuleTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumRandomFeatures

        public void setNumRandomFeatures​(int value)
        Set the number of additional random features to use.
        Parameters:
        value - the number of random features
      • getNumRandomFeatures

        public int getNumRandomFeatures()
        Returns the number of additional random features to use.
        Returns:
        the number of random features
      • numRandomFeaturesTipText

        public String numRandomFeaturesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • startExecutorPool

        protected void startExecutorPool()
        Start the pool of execution threads.
      • buildClassifiers

        protected void buildClassifiers()
                                 throws Exception
        Does the actual construction of the ensemble.
        Throws:
        Exception - if something goes wrong during the training process
      • completedClassifier

        protected void completedClassifier​(int index,
                                           boolean success)
        Records the completion of the training of a single classifier. Unblocks if all classifiers have been trained.
        Parameters:
        index - the index of the classifier that has completed
        success - whether the classifier trained successfully
      • getActualIndex

        protected int getActualIndex​(int index)
                              throws Exception
        Returns the actual index in the data of the feature attribute.
        Parameters:
        index - the index for the requested attribute
        Returns:
        the actual attribute index for the supplied index
        Throws:
        Exception - if something goes wrong
      • getFilter

        protected weka.filters.Filter getFilter​(int index,
                                                int seed,
                                                boolean withMissing)
                                         throws Exception
        Gets a filter for a particular index.
        Parameters:
        index - the index for the requested filter
        seed - the seed value to use for the determining the additional random features
        withMissing - whether to include the RemoveInstancesWithMissingValue filter
        Returns:
        the filter for the supplied index
        Throws:
        Exception - if something goes wrong
      • getTrainingSet

        protected weka.core.Instances getTrainingSet​(int index,
                                                     int seed)
                                              throws Exception
        Gets a training set for a particular index.
        Parameters:
        index - the index for the requested training set
        seed - the seed value to use for the determining the additional random features
        Returns:
        the training set for the supplied index
        Throws:
        Exception - if something goes wrong
      • buildClassifier

        public void buildClassifier​(weka.core.Instances data)
                             throws Exception
        Stump method for building the classifiers
        Parameters:
        data - the training data to be used for generating the ensemble
        Throws:
        Exception - if the classifier could not be built successfully
      • constructEnsemble

        protected weka.classifiers.Classifier constructEnsemble​(weka.core.Instance instance)
        Constructs the ensemble.
        Parameters:
        instance - the instance to base the construction on
      • classifyInstance

        public double classifyInstance​(weka.core.Instance instance)
                                throws Exception
        Classifies the given test instance. The instance has to belong to a dataset when it's being classified.
        Specified by:
        classifyInstance in interface weka.classifiers.Classifier
        Overrides:
        classifyInstance in class weka.classifiers.AbstractClassifier
        Parameters:
        instance - the instance to be classified
        Returns:
        the predicted most likely class for the instance or Utils.missingValue() if no prediction is made
        Throws:
        Exception - if an error occurred during the prediction
      • distributionForInstance

        public double[] distributionForInstance​(weka.core.Instance instance)
                                         throws Exception
        Predicts the class memberships for a given instance. If an instance is unclassified, the returned array elements must be all zero. If the class is numeric, the array must consist of only one element, which contains the predicted value.
        Specified by:
        distributionForInstance in interface weka.classifiers.Classifier
        Overrides:
        distributionForInstance in class weka.classifiers.AbstractClassifier
        Parameters:
        instance - the instance to be classified
        Returns:
        an array containing the estimated membership probabilities of the test instance in each class or the numeric prediction
        Throws:
        Exception - if distribution could not be computed successfully
      • toString

        public String toString()
        Returns a string representation of the classifier.
        Overrides:
        toString in class Object
        Returns:
        the string representation
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.classifiers.AbstractClassifier
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        Main method for running this class from commandline.
        Parameters:
        args - the options