Class KDTree
- java.lang.Object
-
- moa.classifiers.lazy.neighboursearch.NearestNeighbourSearch
-
- moa.classifiers.lazy.neighboursearch.KDTree
-
- All Implemented Interfaces:
Serializable
public class KDTree extends NearestNeighbourSearch
Class implementing the KDTree search algorithm for nearest neighbour search.
The connection to dataset is only a reference. For the tree structure the indexes are stored in an array.
Building the tree:
If a node has <maximal-inst-number> (option -L) instances no further splitting is done. Also if the split would leave one side empty, the branch is not split any further even if the instances in the resulting node are more than <maximal-inst-number> instances.
**PLEASE NOTE:** The algorithm can not handle missing values, so it is advisable to run ReplaceMissingValues filter if there are any missing values in the dataset.
For more information see:
Jerome H. Friedman, Jon Luis Bentley, Raphael Ari Finkel (1977). An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transactions on Mathematics Software. 3(3):209-226.
Andrew Moore (1991). A tutorial on kd-trees. BibTeX:@article{Friedman1977, author = {Jerome H. Friedman and Jon Luis Bentley and Raphael Ari Finkel}, journal = {ACM Transactions on Mathematics Software}, month = {September}, number = {3}, pages = {209-226}, title = {An Algorithm for Finding Best Matches in Logarithmic Expected Time}, volume = {3}, year = {1977} } @techreport{Moore1991, author = {Andrew Moore}, booktitle = {University of Cambridge Computer Laboratory Technical Report No. 209}, howpublished = {Extract from PhD Thesis}, title = {A tutorial on kd-trees}, year = {1991}, HTTP = {Available from http://www.autonlab.org/autonweb/14665.html} }
Valid options are:-S <classname and options> Node splitting method to use. (default: weka.core.neighboursearch.kdtrees.SlidingMidPointOfWidestSide)
-W <value> Set minimal width of a box (default: 1.0E-2).
-L Maximal number of instances in a leaf (default: 40).
-N Normalizing will be done (Select dimension for split, with normalising to universe).
- Version:
- $Revision: 8034 $
- Author:
- Gabi Schmidberger (gabi[at-the-rate]cs[dot]waikato[dot]ac[dot]nz), Malcolm Ware (mfw4[at-the-rate]cs[dot]waikato[dot]ac[dot]nz), Ashraf M. Kibriya (amk14[at-the-rate]cs[dot]waikato[dot]ac[dot]nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class moa.classifiers.lazy.neighboursearch.NearestNeighbourSearch
NearestNeighbourSearch.MyHeap, NearestNeighbourSearch.MyHeapElement, NearestNeighbourSearch.NeighborList, NearestNeighbourSearch.NeighborNode
-
-
Field Summary
Fields Modifier and Type Field Description protected double[]
m_DistanceList
Array holding the distances of the nearest neighbours.protected EuclideanDistance
m_EuclideanDistance
The euclidean distance function to use.protected int[]
m_InstList
Indexlist of the instances of this kdtree.protected int
m_MaxDepth
Tree stats.protected int
m_MaxInstInLeaf
maximal number of instances in a leaf.protected double
m_MinBoxRelWidth
minimal relative width of a KDTree rectangle.protected int
m_NumLeaves
Tree stats.protected int
m_NumNodes
Tree stats.protected KDTreeNode
m_Root
The root node of the tree.protected KDTreeNodeSplitter
m_Splitter
The node splitter.static int
MAX
The index of MAX value in attributes' range array.static int
MIN
The index of MIN value in attributes' range array.static int
WIDTH
The index of WIDTH (MAX-MIN) value in attributes' range array.-
Fields inherited from class moa.classifiers.lazy.neighboursearch.NearestNeighbourSearch
m_DistanceFunction, m_Instances, m_kNN, m_MeasurePerformance
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addInstanceInfo(Instance instance)
Adds one instance to KDTree loosly.protected void
addInstanceToTree(Instance inst, KDTreeNode node)
Recursively adds an instance to the tree starting from the supplied KDTreeNode.protected void
afterAddInstance(KDTreeNode node)
Corrects the start and end indices of a KDTreeNode after an instance is added to the tree.void
assignSubToCenters(KDTreeNode node, Instances centers, int[] centList, int[] assignments)
Assigns instances of this node to center.protected void
buildKDTree(Instances instances)
Builds the KDTree on the supplied set of instances/points.protected boolean
candidateIsFullOwner(KDTreeNode node, Instance candidate, Instance competitor)
Returns true if candidate is a full owner in respect to a competitor.void
centerInstances(Instances centers, int[] assignments, double pc)
Assigns instances to centers using KDTree.protected void
checkMissing(Instance ins)
Checks if there is any missing value in the given instance.protected void
checkMissing(Instances instances)
Checks if there is any instance with missing values.protected boolean
clipToInsideHrect(KDTreeNode node, Instance x)
Finds the closest point in the hyper rectangle to a given point.protected void
determineAssignments(KDTreeNode node, Instances centers, int[] candidates, int[] assignments, double pc)
Assigns instances to the current centers called candidates.protected double
distanceToHrect(KDTreeNode node, Instance x)
Returns the distance between a point and an hyperrectangle.Enumeration
enumerateMeasures()
Returns an enumeration of the additional measure names.protected void
findNearestNeighbours(Instance target, KDTreeNode node, int k, NearestNeighbourSearch.MyHeap heap, double distanceToParents)
Returns (in the supplied heap object) the k nearest neighbours of the given instance starting from the give tree node.DistanceFunction
getDistanceFunction()
returns the distance function currently in use.double[]
getDistances()
Returns the distances to the kNearest or 1 nearest neighbour currently found with either the kNearestNeighbours or the nearestNeighbour method.int
getMaxInstInLeaf()
Get the maximum number of instances in a leaf.protected double
getMaxRelativeNodeWidth(double[][] nodeRanges, double[][] universe)
Returns the maximum attribute width of instances/points in a KDTreeNode relative to the whole dataset.double
getMeasure(String additionalMeasureName)
Returns the value of the named measure.double
getMinBoxRelWidth()
Gets the minimum relative box width.KDTreeNodeSplitter
getNodeSplitter()
Returns the splitting method currently in use to split the nodes of the KDTree.boolean
getNormalizeNodeWidth()
Gets the normalize flag.String
globalInfo()
Returns a string describing this nearest neighbour search algorithm.Instances
kNearestNeighbours(Instance target, int k)
Returns the k nearest neighbours of the supplied instance.String
maxInstInLeafTipText()
Tip text for this property.double
measureMaxDepth()
Returns the depth of the tree.double
measureNumLeaves()
Returns the number of leaves.double
measureTreeSize()
Returns the size of the tree.String
minBoxRelWidthTipText()
Tip text for this property.Instance
nearestNeighbour(Instance target)
Returns the nearest neighbour of the supplied target instance.String
nodeSplitterTipText()
Returns the tip text for this property.String
normalizeNodeWidthTipText()
Tip text for this property.protected int[]
refineOwners(KDTreeNode node, Instances centers, int[] candidates)
Refines the ownerlist.void
setDistanceFunction(DistanceFunction df)
sets the distance function to use for nearest neighbour search.void
setInstances(Instances instances)
Builds the KDTree on the given set of instances.void
setMaxInstInLeaf(int i)
Sets the maximum number of instances in a leaf.void
setMeasurePerformance(boolean measurePerformance)
Sets whether to calculate the performance statistics or not.void
setMinBoxRelWidth(double i)
Sets the minimum relative box width.void
setNodeSplitter(KDTreeNodeSplitter splitter)
Sets the splitting method to use to split the nodes of the KDTree.void
setNormalizeNodeWidth(boolean n)
Sets the flag for normalizing the widths of a KDTree Node by the width of the dimension in the universe.protected void
splitNodes(KDTreeNode node, double[][] universe, int depth)
Recursively splits nodes of a tree starting from the supplied node.void
update(Instance instance)
Adds one instance to the KDTree.protected int
widestDim(double[][] nodeRanges, double[][] universe)
Returns the widest dimension/attribute in a KDTreeNode (widest after normalizing).-
Methods inherited from class moa.classifiers.lazy.neighboursearch.NearestNeighbourSearch
combSort11, distanceFunctionTipText, getInstances, getMeasurePerformance, measurePerformanceTipText, partition, quickSort
-
-
-
-
Field Detail
-
m_DistanceList
protected double[] m_DistanceList
Array holding the distances of the nearest neighbours. It is filled up both by nearestNeighbour() and kNearestNeighbours().
-
m_InstList
protected int[] m_InstList
Indexlist of the instances of this kdtree. Instances get sorted according to the splits. the nodes of the KDTree just hold their start and end indices
-
m_Root
protected KDTreeNode m_Root
The root node of the tree.
-
m_Splitter
protected KDTreeNodeSplitter m_Splitter
The node splitter.
-
m_NumNodes
protected int m_NumNodes
Tree stats.
-
m_NumLeaves
protected int m_NumLeaves
Tree stats.
-
m_MaxDepth
protected int m_MaxDepth
Tree stats.
-
MIN
public static final int MIN
The index of MIN value in attributes' range array.- See Also:
- Constant Field Values
-
MAX
public static final int MAX
The index of MAX value in attributes' range array.- See Also:
- Constant Field Values
-
WIDTH
public static final int WIDTH
The index of WIDTH (MAX-MIN) value in attributes' range array.- See Also:
- Constant Field Values
-
m_EuclideanDistance
protected EuclideanDistance m_EuclideanDistance
The euclidean distance function to use.
-
m_MinBoxRelWidth
protected double m_MinBoxRelWidth
minimal relative width of a KDTree rectangle.
-
m_MaxInstInLeaf
protected int m_MaxInstInLeaf
maximal number of instances in a leaf.
-
-
Constructor Detail
-
KDTree
public KDTree()
Creates a new instance of KDTree.
-
KDTree
public KDTree(Instances insts)
Creates a new instance of KDTree. It also builds the tree on supplied set of Instances.- Parameters:
insts
- The instances/points on which the BallTree should be built on.
-
-
Method Detail
-
buildKDTree
protected void buildKDTree(Instances instances) throws Exception
Builds the KDTree on the supplied set of instances/points. It is adviseable to run the replace missing attributes filter on the passed instances first. NOTE: This method should not be called from outside this class. Outside classes should call setInstances(Instances) instead.- Parameters:
instances
- The instances to build the tree on- Throws:
Exception
- if something goes wrong
-
splitNodes
protected void splitNodes(KDTreeNode node, double[][] universe, int depth) throws Exception
Recursively splits nodes of a tree starting from the supplied node. The splitting stops for any node for which the number of instances/points falls below a given threshold (given by m_MaxInstInLeaf), or if the maximum relative width/range of the instances/points (i.e. max_i(max(att_i) - min(att_i)) ) falls below a given threshold (given by m_MinBoxRelWidth).- Parameters:
node
- The node to start splitting from.universe
- The attribute ranges of the whole dataset.depth
- The depth of the supplied node.- Throws:
Exception
- If there is some problem splitting.
-
findNearestNeighbours
protected void findNearestNeighbours(Instance target, KDTreeNode node, int k, NearestNeighbourSearch.MyHeap heap, double distanceToParents) throws Exception
Returns (in the supplied heap object) the k nearest neighbours of the given instance starting from the give tree node. >k neighbours are returned if there are more than one neighbours at the kth boundary. NOTE: This method should not be used from outside this class. Outside classes should call kNearestNeighbours(Instance, int).- Parameters:
target
- The instance to find the nearest neighbours for.node
- The KDTreeNode to start the search from.k
- The number of neighbours to find.heap
- The MyHeap object to store/update the kNNs found during the search.distanceToParents
- The distance of the supplied target to the parents of the supplied tree node.- Throws:
Exception
- if the nearest neighbour could not be found.
-
kNearestNeighbours
public Instances kNearestNeighbours(Instance target, int k) throws Exception
Returns the k nearest neighbours of the supplied instance. >k neighbours are returned if there are more than one neighbours at the kth boundary.- Specified by:
kNearestNeighbours
in classNearestNeighbourSearch
- Parameters:
target
- The instance to find the nearest neighbours for.k
- The number of neighbours to find.- Returns:
- The k nearest neighbours (or >k if more there are than one neighbours at the kth boundary).
- Throws:
Exception
- if the nearest neighbour could not be found.
-
nearestNeighbour
public Instance nearestNeighbour(Instance target) throws Exception
Returns the nearest neighbour of the supplied target instance.- Specified by:
nearestNeighbour
in classNearestNeighbourSearch
- Parameters:
target
- The instance to find the nearest neighbour for.- Returns:
- The nearest neighbour from among the previously supplied training instances.
- Throws:
Exception
- if the neighbours could not be found.
-
getDistances
public double[] getDistances() throws Exception
Returns the distances to the kNearest or 1 nearest neighbour currently found with either the kNearestNeighbours or the nearestNeighbour method.- Specified by:
getDistances
in classNearestNeighbourSearch
- Returns:
- array containing the distances of the nearestNeighbours. The length and ordering of the array is the same as that of the instances returned by nearestNeighbour functions.
- Throws:
Exception
- if called before calling kNearestNeighbours or nearestNeighbours.
-
setInstances
public void setInstances(Instances instances) throws Exception
Builds the KDTree on the given set of instances.- Overrides:
setInstances
in classNearestNeighbourSearch
- Parameters:
instances
- The insts on which the KDTree is to be built.- Throws:
Exception
- If some error occurs while building the KDTree
-
update
public void update(Instance instance) throws Exception
Adds one instance to the KDTree. This updates the KDTree structure to take into account the newly added training instance.- Specified by:
update
in classNearestNeighbourSearch
- Parameters:
instance
- the instance to be added. Usually the newly added instance in the training set.- Throws:
Exception
- If the instance cannot be added.
-
addInstanceToTree
protected void addInstanceToTree(Instance inst, KDTreeNode node) throws Exception
Recursively adds an instance to the tree starting from the supplied KDTreeNode. NOTE: This should not be called by outside classes, outside classes should instead call update(Instance) method.- Parameters:
inst
- The instance to add to the treenode
- The node to start the recursive search from, for the leaf node where the supplied instance would go.- Throws:
Exception
- If some error occurs while adding the instance.
-
afterAddInstance
protected void afterAddInstance(KDTreeNode node)
Corrects the start and end indices of a KDTreeNode after an instance is added to the tree. The start and end indices for the master index array (m_InstList) stored in the nodes need to be updated for all nodes in the subtree on the right of a node where the instance was added. NOTE: No outside class should call this method.- Parameters:
node
- KDTreeNode whose start and end indices need to be updated.
-
addInstanceInfo
public void addInstanceInfo(Instance instance)
Adds one instance to KDTree loosly. It only changes the ranges in EuclideanDistance, and does not affect the structure of the KDTree.- Overrides:
addInstanceInfo
in classNearestNeighbourSearch
- Parameters:
instance
- the new instance. Usually this is the test instance supplied to update the range of attributes in the distance function.
-
checkMissing
protected void checkMissing(Instances instances) throws Exception
Checks if there is any instance with missing values. Throws an exception if there is, as KDTree does not handle missing values.- Parameters:
instances
- the instances to check- Throws:
Exception
- if missing values are encountered
-
checkMissing
protected void checkMissing(Instance ins) throws Exception
Checks if there is any missing value in the given instance.- Parameters:
ins
- The instance to check missing values in.- Throws:
Exception
- If there is a missing value in the instance.
-
getMaxRelativeNodeWidth
protected double getMaxRelativeNodeWidth(double[][] nodeRanges, double[][] universe)
Returns the maximum attribute width of instances/points in a KDTreeNode relative to the whole dataset.- Parameters:
nodeRanges
- The attribute ranges of the KDTreeNode whose maximum relative width is to be determined.universe
- The attribute ranges of the whole dataset (training instances + test instances so far encountered).- Returns:
- The maximum relative width
-
widestDim
protected int widestDim(double[][] nodeRanges, double[][] universe)
Returns the widest dimension/attribute in a KDTreeNode (widest after normalizing).- Parameters:
nodeRanges
- The attribute ranges of the KDTreeNode.universe
- The attribute ranges of the whole dataset (training instances + test instances so far encountered).- Returns:
- The index of the widest dimension/attribute.
-
measureTreeSize
public double measureTreeSize()
Returns the size of the tree.- Returns:
- the size of the tree
-
measureNumLeaves
public double measureNumLeaves()
Returns the number of leaves.- Returns:
- the number of leaves
-
measureMaxDepth
public double measureMaxDepth()
Returns the depth of the tree.- Returns:
- The depth of the tree
-
enumerateMeasures
public Enumeration enumerateMeasures()
Returns an enumeration of the additional measure names.- Returns:
- an enumeration of the measure names
-
getMeasure
public double getMeasure(String additionalMeasureName)
Returns the value of the named measure.- Parameters:
additionalMeasureName
- the name of the measure to query for its value.- Returns:
- The value of the named measure
- Throws:
IllegalArgumentException
- If the named measure is not supported.
-
setMeasurePerformance
public void setMeasurePerformance(boolean measurePerformance)
Sets whether to calculate the performance statistics or not.- Parameters:
measurePerformance
- Should be true if performance statistics are to be measured.
-
centerInstances
public void centerInstances(Instances centers, int[] assignments, double pc) throws Exception
Assigns instances to centers using KDTree.- Parameters:
centers
- the current centersassignments
- the centerindex for each instancepc
- the threshold value for pruning.- Throws:
Exception
- If there is some problem assigning instances to centers.
-
determineAssignments
protected void determineAssignments(KDTreeNode node, Instances centers, int[] candidates, int[] assignments, double pc) throws Exception
Assigns instances to the current centers called candidates.- Parameters:
node
- The node to start assigning the instances from.centers
- all the current centers.candidates
- the current centers the method works on.assignments
- the center index for each instance.pc
- the threshold value for pruning.- Throws:
Exception
- If there is some problem assigning instances to centers.
-
refineOwners
protected int[] refineOwners(KDTreeNode node, Instances centers, int[] candidates) throws Exception
Refines the ownerlist.- Parameters:
node
- The current tree node.centers
- all centerscandidates
- the indexes of those centers that are candidates.- Returns:
- list of owners
- Throws:
Exception
- If some problem occurs in refining.
-
distanceToHrect
protected double distanceToHrect(KDTreeNode node, Instance x) throws Exception
Returns the distance between a point and an hyperrectangle.- Parameters:
node
- The current node from whose hyperrectangle the distance is to be measured.x
- the point- Returns:
- the distance
- Throws:
Exception
- If some problem occurs in determining the distance to the hyperrectangle.
-
clipToInsideHrect
protected boolean clipToInsideHrect(KDTreeNode node, Instance x)
Finds the closest point in the hyper rectangle to a given point. Change the given point to this closest point by clipping of at all the dimensions to be clipped of. If the point is inside the rectangle it stays unchanged. The return value is true if the point was not changed, so the the return value is true if the point was inside the rectangle.- Parameters:
node
- The current KDTreeNode in whose hyperrectangle the closest point is to be found.x
- a point- Returns:
- true if the input point stayed unchanged.
-
candidateIsFullOwner
protected boolean candidateIsFullOwner(KDTreeNode node, Instance candidate, Instance competitor) throws Exception
Returns true if candidate is a full owner in respect to a competitor.The candidate has been the closer point to the current rectangle or even has been a point within the rectangle. The competitor is competing with the candidate for a few points out of the rectangle although it is a point further away from the rectangle then the candidate. The extrem point is the corner of the rectangle that is furthest away from the candidate towards the direction of the competitor. If the distance candidate to this extreme point is smaller then the distance competitor to this extreme point, then it is proven that none of the points in the rectangle can be owned be the competitor and the candidate is full owner of the rectangle in respect to this competitor. See also D. Pelleg and A. Moore's paper 'Accelerating exact k-means Algorithms with Geometric Reasoning'.
- Parameters:
node
- The current KDTreeNode / hyperrectangle.candidate
- instance that is candidate to be ownercompetitor
- instance that competes against the candidate- Returns:
- true if candidate is full owner
- Throws:
Exception
- If some problem occurs.
-
assignSubToCenters
public void assignSubToCenters(KDTreeNode node, Instances centers, int[] centList, int[] assignments) throws Exception
Assigns instances of this node to center. Center to be assign to is decided by the distance function.- Parameters:
node
- The KDTreeNode whose instances are to be assigned.centers
- all the input centerscentList
- the list of centers to work withassignments
- index list of last assignments- Throws:
Exception
- If there is error assigning the instances.
-
minBoxRelWidthTipText
public String minBoxRelWidthTipText()
Tip text for this property.- Returns:
- the tip text for this property
-
setMinBoxRelWidth
public void setMinBoxRelWidth(double i)
Sets the minimum relative box width.- Parameters:
i
- the minimum relative box width
-
getMinBoxRelWidth
public double getMinBoxRelWidth()
Gets the minimum relative box width.- Returns:
- the minimum relative box width
-
maxInstInLeafTipText
public String maxInstInLeafTipText()
Tip text for this property.- Returns:
- the tip text for this property
-
setMaxInstInLeaf
public void setMaxInstInLeaf(int i)
Sets the maximum number of instances in a leaf.- Parameters:
i
- the maximum number of instances in a leaf
-
getMaxInstInLeaf
public int getMaxInstInLeaf()
Get the maximum number of instances in a leaf.- Returns:
- the maximum number of instances in a leaf
-
normalizeNodeWidthTipText
public String normalizeNodeWidthTipText()
Tip text for this property.- Returns:
- the tip text for this property
-
setNormalizeNodeWidth
public void setNormalizeNodeWidth(boolean n)
Sets the flag for normalizing the widths of a KDTree Node by the width of the dimension in the universe.- Parameters:
n
- true to use normalizing.
-
getNormalizeNodeWidth
public boolean getNormalizeNodeWidth()
Gets the normalize flag.- Returns:
- True if normalizing
-
getDistanceFunction
public DistanceFunction getDistanceFunction()
returns the distance function currently in use.- Overrides:
getDistanceFunction
in classNearestNeighbourSearch
- Returns:
- the distance function
-
setDistanceFunction
public void setDistanceFunction(DistanceFunction df) throws Exception
sets the distance function to use for nearest neighbour search.- Overrides:
setDistanceFunction
in classNearestNeighbourSearch
- Parameters:
df
- the distance function to use- Throws:
Exception
- if not EuclideanDistance
-
nodeSplitterTipText
public String nodeSplitterTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNodeSplitter
public KDTreeNodeSplitter getNodeSplitter()
Returns the splitting method currently in use to split the nodes of the KDTree.- Returns:
- The KDTreeNodeSplitter currently in use.
-
setNodeSplitter
public void setNodeSplitter(KDTreeNodeSplitter splitter)
Sets the splitting method to use to split the nodes of the KDTree.- Parameters:
splitter
- The KDTreeNodeSplitter to use.
-
globalInfo
public String globalInfo()
Returns a string describing this nearest neighbour search algorithm.- Overrides:
globalInfo
in classNearestNeighbourSearch
- Returns:
- a description of the algorithm for displaying in the explorer/experimenter gui
-
-