Class VotedImbalance

  • All Implemented Interfaces:
    Serializable, Cloneable, weka.classifiers.Classifier, weka.core.BatchPredictor, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, ModelOutputHandler, weka.core.OptionHandler, weka.core.Randomizable, weka.core.RevisionHandler

    public class VotedImbalance
    extends weka.classifiers.RandomizableSingleClassifierEnhancer
    implements ModelOutputHandler
    Generates an ensemble using the following approach:
    - do x times:
    * create new dataset, resampled with specified bias
    * build base classifier with it
    If no classifier gets built at all, use ZeroR as backup model, built on the full dataset.
    At prediction time, the Vote meta-classifier (using the pre-built classifiers) is used to determining the class probabilities or regression value.
    Instead of just using a fixed number of resampled models, you can also specify thresholds (= probability that the minority class does not meet) with associated number of resampled models to use.

    Valid options are:

     -num-slots <num>
      Number of execution slots.
      (default: 1 - i.e. no parallelism)
     -combination-rule <AVG|PROD|MAJ|MIN|MAX|MED>
      The combination rule to use
      (default: AVG)
     -num-balanced <num>
      Number of balanced datasets (= number of classifiers) to create.
      (default: 1)
     -thresholds <prob=# [prob=# [...]]>
      Thresholds for number of resampled models (probability=#models); blank-separated list.
      (default: none)
     -num-balanced <num>
      Number of balanced datasets (= number of classifiers) to create.
      (default: 1)
     -B <num>
      Bias factor towards uniform class distribution.
      0 = distribution in input data -- 1 = uniform distribution.
      (default 0)
     -no-replacement
      Disables replacement of instances
      (default: with replacement)
     -suppress-model-output
      Suppress model output
      (default: no)
     -S <num>
      Random number seed.
      (default 1)
     -W
      Full name of base classifier.
      (default: weka.classifiers.rules.ZeroR)
     -output-debug-info
      If set, classifier is run in debug mode and
      may output additional info to the console
     -do-not-check-capabilities
      If set, classifier capabilities are not checked before classifier is built
      (use with caution).
     -num-decimal-places
      The number of decimal places for the output of numbers in the model (default 2).
     
     Options specific to classifier weka.classifiers.rules.ZeroR:
     
     -output-debug-info
      If set, classifier is run in debug mode and
      may output additional info to the console
     -do-not-check-capabilities
      If set, classifier capabilities are not checked before classifier is built
      (use with caution).
     -num-decimal-places
      The number of decimal places for the output of numbers in the model (default 2).
    Options after -- are passed to the designated classifier.
    Version:
    $Revision$
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected int m_ActualNumBalanced
      the actual number of balanced datasets to generate.
      protected weka.classifiers.rules.ZeroR m_BackupModel
      The backup classifier, in case no ensemble could be constructed at prediction time.
      protected double m_Bias
      the bias for the dataset balancing (0 = distribution in input data -- 1 = uniform distribution).
      protected weka.classifiers.Classifier[] m_Classifiers
      the actual classifiers in use.
      protected int m_CombinationRule
      Combination Rule variable.
      protected int m_Completed
      The number of classifiers completed so far
      protected weka.core.Instances m_Data
      For holding the original training set temporarily.
      protected weka.classifiers.Classifier m_Ensemble
      the vote classifier in use.
      protected ThreadPoolExecutor m_ExecutorPool
      Pool of threads to train models with
      protected int m_Failed
      The number of classifiers that experienced a failure of some sort during construction.
      protected weka.core.Instances m_Header
      The header of the training set.
      protected boolean m_NoReplacement
      Whether to perform sampling with replacement or without.
      protected int m_NumBalanced
      the number of balanced datasets to generate.
      protected int m_NumExecutionSlots
      The number of threads to have executing at any one time
      protected double m_SamplePercentage
      the sample percentage to use (0-100).
      protected boolean m_SuppressModelOutput
      whether to suppress the model output.
      protected BaseKeyValuePair[] m_Thresholds
      the thresholds to use (pair: probability minority class = num balanced).
      • Fields inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer

        m_Seed
      • Fields inherited from class weka.classifiers.SingleClassifierEnhancer

        m_Classifier
      • Fields inherited from class weka.classifiers.AbstractClassifier

        BATCH_SIZE_DEFAULT, m_BatchSize, m_Debug, m_DoNotCheckCapabilities, m_numDecimalPlaces, NUM_DECIMAL_PLACES_DEFAULT
    • Constructor Summary

      Constructors 
      Constructor Description
      VotedImbalance()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      String biasTipText()
      Returns the tip text for this property.
      void buildClassifier​(weka.core.Instances data)
      Stump method for building the classifiers
      protected void buildClassifiers()
      Does the actual construction of the ensemble.
      double classifyInstance​(weka.core.Instance instance)
      Classifies the given test instance.
      String combinationRuleTipText()
      Returns the tip text for this property.
      protected void completedClassifier​(int index, boolean success)
      Records the completion of the training of a single classifier.
      protected weka.classifiers.Classifier constructEnsemble()
      Constructs the ensemble.
      double[] distributionForInstance​(weka.core.Instance instance)
      Predicts the class memberships for a given instance.
      double getBias()
      Gets the bias towards a uniform class.
      weka.core.Capabilities getCapabilities()
      Returns default capabilities of the base classifier.
      weka.core.SelectedTag getCombinationRule()
      Gets the combination rule used
      protected weka.filters.Filter getFilter​(int index, int seed)
      Gets a filter for a particular index.
      boolean getNoReplacement()
      Gets whether instances are drawn with or without replacement.
      int getNumBalanced()
      Returns the number of balanced datasets to generate (= #classifiers).
      int getNumExecutionSlots()
      Get the number of execution slots (threads) to use for building the members of the ensemble.
      String[] getOptions()
      Gets the current settings of the classifier.
      String getRevision()
      Returns the revision string.
      boolean getSuppressModelOutput()
      Returns whether to output the model with the toString() method or not.
      String getThresholds()
      Returns the pairs of threshold/number of resampled models.
      protected weka.core.Instances getTrainingSet​(int index, int seed)
      Gets a training set for a particular index.
      String globalInfo()
      Returns a string describing the classifier.
      Enumeration listOptions()
      Returns an enumeration describing the available options.
      static void main​(String[] args)
      Main method for running this class from commandline.
      String noReplacementTipText()
      Returns the tip text for this property.
      String numBalancedTipText()
      Returns the tip text for this property.
      String numExecutionSlotsTipText()
      Returns the tip text for this property.
      void setBias​(double value)
      Sets the bias towards a uniform class.
      void setClassifier​(weka.classifiers.Classifier newClassifier)  
      void setCombinationRule​(weka.core.SelectedTag value)
      Sets the combination rule to use.
      void setNoReplacement​(boolean value)
      Sets whether instances are drawn with or with out replacement.
      void setNumBalanced​(int value)
      Set the number of balanced datasets to generated (= #classifiers).
      void setNumExecutionSlots​(int value)
      Set the number of execution slots (threads) to use for building the members of the ensemble.
      void setOptions​(String[] options)
      Parses a given list of options.
      void setSuppressModelOutput​(boolean value)
      Sets whether to output the model with the toString() method or not.
      void setThresholds​(String value)
      Set the pairs of threshold/number of resampled models.
      protected void startExecutorPool()
      Start the pool of execution threads.
      String suppressModelOutputTipText()
      Returns the tip text for this property.
      String thresholdsTipText()
      Returns the tip text for this property.
      String toString()
      Returns a string representation of the classifier.
      • Methods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer

        getSeed, seedTipText, setSeed
      • Methods inherited from class weka.classifiers.SingleClassifierEnhancer

        classifierTipText, defaultClassifierOptions, defaultClassifierString, getClassifier, getClassifierSpec, postExecution, preExecution
      • Methods inherited from class weka.classifiers.AbstractClassifier

        batchSizeTipText, debugTipText, distributionsForInstances, doNotCheckCapabilitiesTipText, forName, getBatchSize, getDebug, getDoNotCheckCapabilities, getNumDecimalPlaces, implementsMoreEfficientBatchPrediction, makeCopies, makeCopy, numDecimalPlacesTipText, run, runClassifier, setBatchSize, setDebug, setDoNotCheckCapabilities, setNumDecimalPlaces
    • Field Detail

      • m_Classifiers

        protected weka.classifiers.Classifier[] m_Classifiers
        the actual classifiers in use.
      • m_NumExecutionSlots

        protected int m_NumExecutionSlots
        The number of threads to have executing at any one time
      • m_CombinationRule

        protected int m_CombinationRule
        Combination Rule variable.
      • m_NumBalanced

        protected int m_NumBalanced
        the number of balanced datasets to generate.
      • m_Thresholds

        protected BaseKeyValuePair[] m_Thresholds
        the thresholds to use (pair: probability minority class = num balanced).
      • m_ActualNumBalanced

        protected int m_ActualNumBalanced
        the actual number of balanced datasets to generate.
      • m_Bias

        protected double m_Bias
        the bias for the dataset balancing (0 = distribution in input data -- 1 = uniform distribution).
      • m_NoReplacement

        protected boolean m_NoReplacement
        Whether to perform sampling with replacement or without.
      • m_ExecutorPool

        protected transient ThreadPoolExecutor m_ExecutorPool
        Pool of threads to train models with
      • m_Completed

        protected int m_Completed
        The number of classifiers completed so far
      • m_Failed

        protected int m_Failed
        The number of classifiers that experienced a failure of some sort during construction.
      • m_Data

        protected weka.core.Instances m_Data
        For holding the original training set temporarily.
      • m_Header

        protected weka.core.Instances m_Header
        The header of the training set.
      • m_BackupModel

        protected weka.classifiers.rules.ZeroR m_BackupModel
        The backup classifier, in case no ensemble could be constructed at prediction time.
      • m_Ensemble

        protected weka.classifiers.Classifier m_Ensemble
        the vote classifier in use.
      • m_SamplePercentage

        protected double m_SamplePercentage
        the sample percentage to use (0-100).
      • m_SuppressModelOutput

        protected boolean m_SuppressModelOutput
        whether to suppress the model output.
    • Constructor Detail

      • VotedImbalance

        public VotedImbalance()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the classifier.
        Returns:
        a description suitable for displaying in the gui
      • listOptions

        public Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.classifiers.RandomizableSingleClassifierEnhancer
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses a given list of options.
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.classifiers.RandomizableSingleClassifierEnhancer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
      • getOptions

        public String[] getOptions()
        Gets the current settings of the classifier.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.classifiers.RandomizableSingleClassifierEnhancer
        Returns:
        an array of strings suitable for passing to setOptions
      • setClassifier

        public void setClassifier​(weka.classifiers.Classifier newClassifier)
        Overrides:
        setClassifier in class weka.classifiers.SingleClassifierEnhancer
      • setNumExecutionSlots

        public void setNumExecutionSlots​(int value)
        Set the number of execution slots (threads) to use for building the members of the ensemble.
        Parameters:
        value - the number of slots to use.
      • getNumExecutionSlots

        public int getNumExecutionSlots()
        Get the number of execution slots (threads) to use for building the members of the ensemble.
        Returns:
        the number of slots to use
      • numExecutionSlotsTipText

        public String numExecutionSlotsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setCombinationRule

        public void setCombinationRule​(weka.core.SelectedTag value)
        Sets the combination rule to use. Values other than
        Parameters:
        value - the combination rule method to use
      • getCombinationRule

        public weka.core.SelectedTag getCombinationRule()
        Gets the combination rule used
        Returns:
        the combination rule used
      • combinationRuleTipText

        public String combinationRuleTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumBalanced

        public void setNumBalanced​(int value)
        Set the number of balanced datasets to generated (= #classifiers).
        Parameters:
        value - the number of datasets
      • getNumBalanced

        public int getNumBalanced()
        Returns the number of balanced datasets to generate (= #classifiers).
        Returns:
        the number of datasets
      • numBalancedTipText

        public String numBalancedTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setThresholds

        public void setThresholds​(String value)
        Set the pairs of threshold/number of resampled models.
        Parameters:
        value - the pairs (blank-separated list; probability=#models)
      • getThresholds

        public String getThresholds()
        Returns the pairs of threshold/number of resampled models.
        Returns:
        the pairs
      • thresholdsTipText

        public String thresholdsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setBias

        public void setBias​(double value)
        Sets the bias towards a uniform class. A value of 0 leaves the class distribution as-is, a value of 1 ensures the class distributions are uniform in the output data.
        Parameters:
        value - the new bias value, between 0 and 1.
      • getBias

        public double getBias()
        Gets the bias towards a uniform class. A value of 0 leaves the class distribution as-is, a value of 1 ensures the class distributions are uniform in the output data.
        Returns:
        the current bias
      • biasTipText

        public String biasTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNoReplacement

        public void setNoReplacement​(boolean value)
        Sets whether instances are drawn with or with out replacement.
        Parameters:
        value - if true then the replacement of instances is disabled
      • getNoReplacement

        public boolean getNoReplacement()
        Gets whether instances are drawn with or without replacement.
        Returns:
        true if the replacement is disabled
      • noReplacementTipText

        public String noReplacementTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setSuppressModelOutput

        public void setSuppressModelOutput​(boolean value)
        Sets whether to output the model with the toString() method or not.
        Specified by:
        setSuppressModelOutput in interface ModelOutputHandler
        Parameters:
        value - true if to suppress model output
      • getSuppressModelOutput

        public boolean getSuppressModelOutput()
        Returns whether to output the model with the toString() method or not.
        Specified by:
        getSuppressModelOutput in interface ModelOutputHandler
        Returns:
        the label index
      • suppressModelOutputTipText

        public String suppressModelOutputTipText()
        Returns the tip text for this property.
        Specified by:
        suppressModelOutputTipText in interface ModelOutputHandler
        Returns:
        tip text for this property suitable for displaying in the gui
      • startExecutorPool

        protected void startExecutorPool()
        Start the pool of execution threads.
      • getFilter

        protected weka.filters.Filter getFilter​(int index,
                                                int seed)
                                         throws Exception
        Gets a filter for a particular index.
        Parameters:
        index - the index for the requested filter
        seed - the seed value to use for the determining the additional random features
        Throws:
        Exception - if something goes wrong
      • getTrainingSet

        protected weka.core.Instances getTrainingSet​(int index,
                                                     int seed)
                                              throws Exception
        Gets a training set for a particular index.
        Parameters:
        index - the index for the requested training set
        seed - the seed value to use for the determining the additional random features
        Returns:
        the training set for the supplied index
        Throws:
        Exception - if something goes wrong
      • completedClassifier

        protected void completedClassifier​(int index,
                                           boolean success)
        Records the completion of the training of a single classifier. Unblocks if all classifiers have been trained.
        Parameters:
        index - the index of the classifier that has completed
        success - whether the classifier trained successfully
      • buildClassifiers

        protected void buildClassifiers()
                                 throws Exception
        Does the actual construction of the ensemble.
        Throws:
        Exception - if something goes wrong during the training process
      • constructEnsemble

        protected weka.classifiers.Classifier constructEnsemble()
        Constructs the ensemble.
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns default capabilities of the base classifier.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Specified by:
        getCapabilities in interface weka.classifiers.Classifier
        Overrides:
        getCapabilities in class weka.classifiers.SingleClassifierEnhancer
        Returns:
        the capabilities of the base classifier
      • buildClassifier

        public void buildClassifier​(weka.core.Instances data)
                             throws Exception
        Stump method for building the classifiers
        Specified by:
        buildClassifier in interface weka.classifiers.Classifier
        Parameters:
        data - the training data to be used for generating the ensemble
        Throws:
        Exception - if the classifier could not be built successfully
      • classifyInstance

        public double classifyInstance​(weka.core.Instance instance)
                                throws Exception
        Classifies the given test instance. The instance has to belong to a dataset when it's being classified.
        Specified by:
        classifyInstance in interface weka.classifiers.Classifier
        Overrides:
        classifyInstance in class weka.classifiers.AbstractClassifier
        Parameters:
        instance - the instance to be classified
        Returns:
        the predicted most likely class for the instance or Utils.missingValue() if no prediction is made
        Throws:
        Exception - if an error occurred during the prediction
      • distributionForInstance

        public double[] distributionForInstance​(weka.core.Instance instance)
                                         throws Exception
        Predicts the class memberships for a given instance. If an instance is unclassified, the returned array elements must be all zero. If the class is numeric, the array must consist of only one element, which contains the predicted value.
        Specified by:
        distributionForInstance in interface weka.classifiers.Classifier
        Overrides:
        distributionForInstance in class weka.classifiers.AbstractClassifier
        Parameters:
        instance - the instance to be classified
        Returns:
        an array containing the estimated membership probabilities of the test instance in each class or the numeric prediction
        Throws:
        Exception - if distribution could not be computed successfully
      • toString

        public String toString()
        Returns a string representation of the classifier.
        Overrides:
        toString in class Object
        Returns:
        the string representation
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.classifiers.AbstractClassifier
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        Main method for running this class from commandline.
        Parameters:
        args - the options