Class HoeffdingTree

  • All Implemented Interfaces:
    Configurable, Serializable, CapabilitiesHandler, Classifier, MultiClassClassifier, AWTRenderable, Learner<Example<Instance>>, MOAObject, OptionHandler
    Direct Known Subclasses:
    ARFHoeffdingTree, ASHoeffdingTree, HoeffdingAdaptiveTree, HoeffdingTreeClassifLeaves, LimAttHoeffdingTree, RandomHoeffdingTree

    public class HoeffdingTree
    extends AbstractClassifier
    implements MultiClassClassifier, CapabilitiesHandler
    Hoeffding Tree or VFDT. A Hoeffding tree is an incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time. Hoeffding trees exploit the fact that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound, which quantifies the number of observations (in our case, examples) needed to estimate some statistics within a prescribed precision (in our case, the goodness of an attribute).

    A theoretically appealing feature of Hoeffding Trees not shared by other incremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. See for details:

    G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD’01, pages 97–106, San Francisco, CA, 2001. ACM Press.

    Parameters:

    • -m : Maximum memory consumed by the tree
    • -n : Numeric estimator to use :
      • Gaussian approximation evaluating 10 splitpoints
      • Gaussian approximation evaluating 100 splitpoints
      • Greenwald-Khanna quantile summary with 10 tuples
      • Greenwald-Khanna quantile summary with 100 tuples
      • Greenwald-Khanna quantile summary with 1000 tuples
      • VFML method with 10 bins
      • VFML method with 100 bins
      • VFML method with 1000 bins
      • Exhaustive binary tree
    • -e : How many instances between memory consumption checks
    • -g : The number of instances a leaf should observe between split attempts
    • -s : Split criterion to use. Example : InfoGainSplitCriterion
    • -c : The allowable error in split decision, values closer to 0 will take longer to decide
    • -t : Threshold below which a split will be forced to break ties
    • -b : Only allow binary splits
    • -z : Stop growing as soon as memory limit is hit
    • -r : Disable poor attributes
    • -p : Disable pre-pruning
    • -l : Leaf prediction to use: MajorityClass (MC), Naive Bayes (NB) or NaiveBayes adaptive (NBAdaptive).
    • -q : The number of instances a leaf should observe before permitting Naive Bayes
    Version:
    $Revision: 7 $
    Author:
    Richard Kirkby (rkirkby@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Field Detail

      • maxByteSizeOption

        public IntOption maxByteSizeOption
      • numericEstimatorOption

        public ClassOption numericEstimatorOption
      • nominalEstimatorOption

        public ClassOption nominalEstimatorOption
      • memoryEstimatePeriodOption

        public IntOption memoryEstimatePeriodOption
      • gracePeriodOption

        public IntOption gracePeriodOption
      • splitCriterionOption

        public ClassOption splitCriterionOption
      • splitConfidenceOption

        public FloatOption splitConfidenceOption
      • tieThresholdOption

        public FloatOption tieThresholdOption
      • binarySplitsOption

        public FlagOption binarySplitsOption
      • stopMemManagementOption

        public FlagOption stopMemManagementOption
      • removePoorAttsOption

        public FlagOption removePoorAttsOption
      • noPrePruneOption

        public FlagOption noPrePruneOption
      • decisionNodeCount

        protected int decisionNodeCount
      • activeLeafNodeCount

        protected int activeLeafNodeCount
      • inactiveLeafNodeCount

        protected int inactiveLeafNodeCount
      • inactiveLeafByteSizeEstimate

        protected double inactiveLeafByteSizeEstimate
      • activeLeafByteSizeEstimate

        protected double activeLeafByteSizeEstimate
      • byteSizeEstimateOverheadFraction

        protected double byteSizeEstimateOverheadFraction
      • growthAllowed

        protected boolean growthAllowed
      • nbThresholdOption

        public IntOption nbThresholdOption
    • Constructor Detail

      • HoeffdingTree

        public HoeffdingTree()