Class OzaBagASHT

  • All Implemented Interfaces:
    Configurable, Serializable, CapabilitiesHandler, Classifier, MultiClassClassifier, AWTRenderable, Learner<Example<Instance>>, MOAObject, OptionHandler

    public class OzaBagASHT
    extends AbstractClassifier
    implements MultiClassClassifier
    Bagging using trees of different size. The Adaptive-Size Hoeffding Tree (ASHT) is derived from the Hoeffding Tree algorithm with the following differences:
    • it has a maximum number of split nodes, or size
    • after one node splits, if the number of split nodes of the ASHT tree is higher than the maximum value, then it deletes some nodes to reduce its size
    The intuition behind this method is as follows: smaller trees adapt more quickly to changes, and larger trees do better during periods with no or little change, simply because they were built on more data. Trees limited to size s will be reset about twice as often as trees with a size limit of 2s. This creates a set of different reset-speeds for an ensemble of such trees, and therefore a subset of trees that are a good approximation for the current rate of change. It is important to note that resets will happen all the time, even for stationary datasets, but this behaviour should not have a negative impact on the ensemble’s predictive performance. When the tree size exceeds the maximun size value, there are two different delete options:
    • delete the oldest node, the root, and all of its children except the one where the split has been made. After that, the root of the child not deleted becomes the new root
    • delete all the nodes of the tree, i.e., restart from a new root.
    The maximum allowed size for the n-th ASHT tree is twice the maximum allowed size for the (n-1)-th tree. Moreover, each tree has a weight proportional to the inverse of the square of its error, and it monitors its error with an exponential weighted moving average (EWMA) with alpha = .01. The size of the first tree is 2.

    With this new method, it is attempted to improve bagging performance by increasing tree diversity. It has been observed that boosting tends to produce a more diverse set of classifiers than bagging, and this has been cited as a factor in increased performance.
    See more details in:

    Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavaldà. New ensemble methods for evolving data streams. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009.

    The learner must be ASHoeffdingTree, a Hoeffding Tree with a maximum size value.

    Example:

    OzaBagASHT -l ASHoeffdingTree -s 10 -u -r Parameters:
    • Same parameters as OzaBag
    • -f : the size of first classifier in the bag.
    • -u : Enable weight classifiers
    • -e : Reset trees when size is higher than the max
    Version:
    $Revision: 7 $
    Author:
    Albert Bifet (abifet at cs dot waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • ensembleSizeOption

        public IntOption ensembleSizeOption
      • firstClassifierSizeOption

        public IntOption firstClassifierSizeOption
      • useWeightOption

        public FlagOption useWeightOption
      • resetTreesOption

        public FlagOption resetTreesOption
      • baseLearnerOption

        public ClassOption baseLearnerOption
      • error

        protected double[] error
      • alpha

        protected double alpha
    • Constructor Detail

      • OzaBagASHT

        public OzaBagASHT()
    • Method Detail

      • resetLearningImpl

        public void resetLearningImpl()
        Description copied from class: AbstractClassifier
        Resets this classifier. It must be similar to starting a new classifier from scratch.

        The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.
        Specified by:
        resetLearningImpl in class AbstractClassifier
      • trainOnInstanceImpl

        public void trainOnInstanceImpl​(Instance inst)
        Description copied from class: AbstractClassifier
        Trains this classifier incrementally using the given instance.

        The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.
        Specified by:
        trainOnInstanceImpl in class AbstractClassifier
        Parameters:
        inst - the instance to be used for training
      • getVotesForInstance

        public double[] getVotesForInstance​(Instance inst)
        Description copied from interface: Classifier
        Predicts the class memberships for a given instance. If an instance is unclassified, the returned array elements must be all zero.
        Specified by:
        getVotesForInstance in interface Classifier
        Specified by:
        getVotesForInstance in class AbstractClassifier
        Parameters:
        inst - the instance to be classified
        Returns:
        an array containing the estimated membership probabilities of the test instance in each class
      • getModelDescription

        public void getModelDescription​(StringBuilder out,
                                        int indent)
        Description copied from class: AbstractClassifier
        Returns a string representation of the model.
        Specified by:
        getModelDescription in class AbstractClassifier
        Parameters:
        out - the stringbuilder to add the description
        indent - the number of characters to indent
      • isRandomizable

        public boolean isRandomizable()
        Description copied from interface: Learner
        Gets whether this learner needs a random seed. Examples of methods that needs a random seed are bagging and boosting.
        Specified by:
        isRandomizable in interface Learner<Example<Instance>>
        Returns:
        true if the learner needs a random seed.
      • getModelMeasurementsImpl

        protected Measurement[] getModelMeasurementsImpl()
        Description copied from class: AbstractClassifier
        Gets the current measurements of this classifier.

        The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.
        Specified by:
        getModelMeasurementsImpl in class AbstractClassifier
        Returns:
        an array of measurements to be used in evaluation tasks