Package moa.classifiers.trees
Class HoeffdingTree
- java.lang.Object
-
- moa.AbstractMOAObject
-
- moa.options.AbstractOptionHandler
-
- moa.classifiers.AbstractClassifier
-
- moa.classifiers.trees.HoeffdingTree
-
- All Implemented Interfaces:
Configurable
,Serializable
,CapabilitiesHandler
,Classifier
,MultiClassClassifier
,AWTRenderable
,Learner<Example<Instance>>
,MOAObject
,OptionHandler
- Direct Known Subclasses:
ARFHoeffdingTree
,ASHoeffdingTree
,HoeffdingAdaptiveTree
,HoeffdingTreeClassifLeaves
,LimAttHoeffdingTree
,RandomHoeffdingTree
public class HoeffdingTree extends AbstractClassifier implements MultiClassClassifier, CapabilitiesHandler
Hoeffding Tree or VFDT. A Hoeffding tree is an incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time. Hoeffding trees exploit the fact that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound, which quantifies the number of observations (in our case, examples) needed to estimate some statistics within a prescribed precision (in our case, the goodness of an attribute).A theoretically appealing feature of Hoeffding Trees not shared by other incremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. See for details:
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD’01, pages 97–106, San Francisco, CA, 2001. ACM Press.
Parameters:
- -m : Maximum memory consumed by the tree
- -n : Numeric estimator to use :
- Gaussian approximation evaluating 10 splitpoints
- Gaussian approximation evaluating 100 splitpoints
- Greenwald-Khanna quantile summary with 10 tuples
- Greenwald-Khanna quantile summary with 100 tuples
- Greenwald-Khanna quantile summary with 1000 tuples
- VFML method with 10 bins
- VFML method with 100 bins
- VFML method with 1000 bins
- Exhaustive binary tree
- -e : How many instances between memory consumption checks
- -g : The number of instances a leaf should observe between split attempts
- -s : Split criterion to use. Example : InfoGainSplitCriterion
- -c : The allowable error in split decision, values closer to 0 will take longer to decide
- -t : Threshold below which a split will be forced to break ties
- -b : Only allow binary splits
- -z : Stop growing as soon as memory limit is hit
- -r : Disable poor attributes
- -p : Disable pre-pruning
- -l : Leaf prediction to use: MajorityClass (MC), Naive Bayes (NB) or NaiveBayes adaptive (NBAdaptive).
- -q : The number of instances a leaf should observe before permitting Naive Bayes
- Version:
- $Revision: 7 $
- Author:
- Richard Kirkby (rkirkby@cs.waikato.ac.nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
HoeffdingTree.ActiveLearningNode
static class
HoeffdingTree.FoundNode
static class
HoeffdingTree.InactiveLearningNode
static class
HoeffdingTree.LearningNode
static class
HoeffdingTree.LearningNodeNB
static class
HoeffdingTree.LearningNodeNBAdaptive
static class
HoeffdingTree.Node
static class
HoeffdingTree.SplitNode
-
Field Summary
Fields Modifier and Type Field Description protected double
activeLeafByteSizeEstimate
protected int
activeLeafNodeCount
FlagOption
binarySplitsOption
protected double
byteSizeEstimateOverheadFraction
protected int
decisionNodeCount
IntOption
gracePeriodOption
protected boolean
growthAllowed
protected double
inactiveLeafByteSizeEstimate
protected int
inactiveLeafNodeCount
MultiChoiceOption
leafpredictionOption
IntOption
maxByteSizeOption
IntOption
memoryEstimatePeriodOption
IntOption
nbThresholdOption
ClassOption
nominalEstimatorOption
FlagOption
noPrePruneOption
ClassOption
numericEstimatorOption
FlagOption
removePoorAttsOption
FloatOption
splitConfidenceOption
ClassOption
splitCriterionOption
FlagOption
stopMemManagementOption
FloatOption
tieThresholdOption
protected HoeffdingTree.Node
treeRoot
-
Fields inherited from class moa.classifiers.AbstractClassifier
classifierRandom, modelContext, randomSeed, randomSeedOption, trainingWeightSeenByModel
-
Fields inherited from class moa.options.AbstractOptionHandler
config
-
-
Constructor Summary
Constructors Constructor Description HoeffdingTree()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
activateLearningNode(HoeffdingTree.InactiveLearningNode toActivate, HoeffdingTree.SplitNode parent, int parentBranch)
protected void
attemptToSplit(HoeffdingTree.ActiveLearningNode node, HoeffdingTree.SplitNode parent, int parentIndex)
int
calcByteSize()
static double
computeHoeffdingBound(double range, double confidence, double n)
void
deactivateAllLeaves()
protected void
deactivateLearningNode(HoeffdingTree.ActiveLearningNode toDeactivate, HoeffdingTree.SplitNode parent, int parentBranch)
ImmutableCapabilities
defineImmutableCapabilities()
Defines the set of capabilities the object has.void
enforceTrackerLimit()
void
estimateModelByteSizes()
protected HoeffdingTree.FoundNode[]
findLearningNodes()
protected void
findLearningNodes(HoeffdingTree.Node node, HoeffdingTree.SplitNode parent, int parentBranch, List<HoeffdingTree.FoundNode> found)
void
getModelDescription(StringBuilder out, int indent)
Returns a string representation of the model.protected Measurement[]
getModelMeasurementsImpl()
Gets the current measurements of this classifier.
The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods.int
getNodeCount()
String
getPurposeString()
Dictionary with option texts and objectsHoeffdingTree.Node
getTreeRoot()
double[]
getVotesForInstance(Instance inst)
Predicts the class memberships for a given instance.boolean
isRandomizable()
Gets whether this learner needs a random seed.int
measureByteSize()
Gets the memory size of this object.int
measureTreeDepth()
protected HoeffdingTree.LearningNode
newLearningNode()
protected HoeffdingTree.LearningNode
newLearningNode(double[] initialClassObservations)
protected AttributeClassObserver
newNominalClassObserver()
protected AttributeClassObserver
newNumericClassObserver()
protected HoeffdingTree.SplitNode
newSplitNode(InstanceConditionalTest splitTest, double[] classObservations)
protected HoeffdingTree.SplitNode
newSplitNode(InstanceConditionalTest splitTest, double[] classObservations, int size)
void
resetLearningImpl()
Resets this classifier.void
trainOnInstanceImpl(Instance inst)
Trains this classifier incrementally using the given instance.
The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods.-
Methods inherited from class moa.classifiers.AbstractClassifier
contextIsCompatible, copy, correctlyClassifies, getAttributeNameString, getAWTRenderer, getClassLabelString, getClassNameString, getDescription, getModel, getModelContext, getModelMeasurements, getNominalValueString, getPredictionForInstance, getPredictionForInstance, getSubClassifiers, getSublearners, getVotesForInstance, modelAttIndexToInstanceAttIndex, modelAttIndexToInstanceAttIndex, prepareForUseImpl, resetLearning, setModelContext, setRandomSeed, trainingHasStarted, trainingWeightSeenByModel, trainOnInstance, trainOnInstance
-
Methods inherited from class moa.options.AbstractOptionHandler
getCLICreationString, getOptions, getPreparedClassOption, prepareClassOptions, prepareForUse, prepareForUse
-
Methods inherited from class moa.AbstractMOAObject
copy, measureByteSize, toString
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface moa.capabilities.CapabilitiesHandler
getCapabilities
-
Methods inherited from interface moa.options.OptionHandler
getCLICreationString, getOptions, prepareForUse, prepareForUse
-
-
-
-
Field Detail
-
maxByteSizeOption
public IntOption maxByteSizeOption
-
numericEstimatorOption
public ClassOption numericEstimatorOption
-
nominalEstimatorOption
public ClassOption nominalEstimatorOption
-
memoryEstimatePeriodOption
public IntOption memoryEstimatePeriodOption
-
gracePeriodOption
public IntOption gracePeriodOption
-
splitCriterionOption
public ClassOption splitCriterionOption
-
splitConfidenceOption
public FloatOption splitConfidenceOption
-
tieThresholdOption
public FloatOption tieThresholdOption
-
binarySplitsOption
public FlagOption binarySplitsOption
-
stopMemManagementOption
public FlagOption stopMemManagementOption
-
removePoorAttsOption
public FlagOption removePoorAttsOption
-
noPrePruneOption
public FlagOption noPrePruneOption
-
treeRoot
protected HoeffdingTree.Node treeRoot
-
decisionNodeCount
protected int decisionNodeCount
-
activeLeafNodeCount
protected int activeLeafNodeCount
-
inactiveLeafNodeCount
protected int inactiveLeafNodeCount
-
inactiveLeafByteSizeEstimate
protected double inactiveLeafByteSizeEstimate
-
activeLeafByteSizeEstimate
protected double activeLeafByteSizeEstimate
-
byteSizeEstimateOverheadFraction
protected double byteSizeEstimateOverheadFraction
-
growthAllowed
protected boolean growthAllowed
-
leafpredictionOption
public MultiChoiceOption leafpredictionOption
-
nbThresholdOption
public IntOption nbThresholdOption
-
-
Method Detail
-
getPurposeString
public String getPurposeString()
Description copied from class:AbstractOptionHandler
Dictionary with option texts and objects- Specified by:
getPurposeString
in interfaceOptionHandler
- Overrides:
getPurposeString
in classAbstractClassifier
- Returns:
- the string with the purpose of this object
-
calcByteSize
public int calcByteSize()
-
getNodeCount
public int getNodeCount()
-
getTreeRoot
public HoeffdingTree.Node getTreeRoot()
-
measureByteSize
public int measureByteSize()
Description copied from interface:MOAObject
Gets the memory size of this object.- Specified by:
measureByteSize
in interfaceMOAObject
- Overrides:
measureByteSize
in classAbstractMOAObject
- Returns:
- the memory size of this object
-
resetLearningImpl
public void resetLearningImpl()
Description copied from class:AbstractClassifier
Resets this classifier. It must be similar to starting a new classifier from scratch.
The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.- Specified by:
resetLearningImpl
in classAbstractClassifier
-
trainOnInstanceImpl
public void trainOnInstanceImpl(Instance inst)
Description copied from class:AbstractClassifier
Trains this classifier incrementally using the given instance.
The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.- Specified by:
trainOnInstanceImpl
in classAbstractClassifier
- Parameters:
inst
- the instance to be used for training
-
getVotesForInstance
public double[] getVotesForInstance(Instance inst)
Description copied from interface:Classifier
Predicts the class memberships for a given instance. If an instance is unclassified, the returned array elements must be all zero.- Specified by:
getVotesForInstance
in interfaceClassifier
- Specified by:
getVotesForInstance
in classAbstractClassifier
- Parameters:
inst
- the instance to be classified- Returns:
- an array containing the estimated membership probabilities of the test instance in each class
-
getModelMeasurementsImpl
protected Measurement[] getModelMeasurementsImpl()
Description copied from class:AbstractClassifier
Gets the current measurements of this classifier.
The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.- Specified by:
getModelMeasurementsImpl
in classAbstractClassifier
- Returns:
- an array of measurements to be used in evaluation tasks
-
measureTreeDepth
public int measureTreeDepth()
-
getModelDescription
public void getModelDescription(StringBuilder out, int indent)
Description copied from class:AbstractClassifier
Returns a string representation of the model.- Specified by:
getModelDescription
in classAbstractClassifier
- Parameters:
out
- the stringbuilder to add the descriptionindent
- the number of characters to indent
-
isRandomizable
public boolean isRandomizable()
Description copied from interface:Learner
Gets whether this learner needs a random seed. Examples of methods that needs a random seed are bagging and boosting.- Specified by:
isRandomizable
in interfaceLearner<Example<Instance>>
- Returns:
- true if the learner needs a random seed.
-
computeHoeffdingBound
public static double computeHoeffdingBound(double range, double confidence, double n)
-
newSplitNode
protected HoeffdingTree.SplitNode newSplitNode(InstanceConditionalTest splitTest, double[] classObservations, int size)
-
newSplitNode
protected HoeffdingTree.SplitNode newSplitNode(InstanceConditionalTest splitTest, double[] classObservations)
-
newNominalClassObserver
protected AttributeClassObserver newNominalClassObserver()
-
newNumericClassObserver
protected AttributeClassObserver newNumericClassObserver()
-
attemptToSplit
protected void attemptToSplit(HoeffdingTree.ActiveLearningNode node, HoeffdingTree.SplitNode parent, int parentIndex)
-
enforceTrackerLimit
public void enforceTrackerLimit()
-
estimateModelByteSizes
public void estimateModelByteSizes()
-
deactivateAllLeaves
public void deactivateAllLeaves()
-
deactivateLearningNode
protected void deactivateLearningNode(HoeffdingTree.ActiveLearningNode toDeactivate, HoeffdingTree.SplitNode parent, int parentBranch)
-
activateLearningNode
protected void activateLearningNode(HoeffdingTree.InactiveLearningNode toActivate, HoeffdingTree.SplitNode parent, int parentBranch)
-
findLearningNodes
protected HoeffdingTree.FoundNode[] findLearningNodes()
-
findLearningNodes
protected void findLearningNodes(HoeffdingTree.Node node, HoeffdingTree.SplitNode parent, int parentBranch, List<HoeffdingTree.FoundNode> found)
-
newLearningNode
protected HoeffdingTree.LearningNode newLearningNode()
-
newLearningNode
protected HoeffdingTree.LearningNode newLearningNode(double[] initialClassObservations)
-
defineImmutableCapabilities
public ImmutableCapabilities defineImmutableCapabilities()
Description copied from interface:CapabilitiesHandler
Defines the set of capabilities the object has. Should be overridden if the object's capabilities do not change.- Specified by:
defineImmutableCapabilities
in interfaceCapabilitiesHandler
- Overrides:
defineImmutableCapabilities
in classAbstractClassifier
- Returns:
- The capabilities of the object.
-
-