Package weka.clusterers
Class SAXKMeans
- java.lang.Object
-
- weka.clusterers.AbstractClusterer
-
- weka.clusterers.RandomizableClusterer
-
- weka.clusterers.SAXKMeans
-
- All Implemented Interfaces:
Serializable
,Cloneable
,weka.clusterers.Clusterer
,weka.clusterers.NumberOfClustersRequestable
,weka.core.CapabilitiesHandler
,weka.core.CapabilitiesIgnorer
,weka.core.CommandlineRunnable
,weka.core.OptionHandler
,weka.core.Randomizable
,weka.core.RevisionHandler
,weka.core.TechnicalInformationHandler
,weka.core.WeightedInstancesHandler
public class SAXKMeans extends weka.clusterers.RandomizableClusterer implements weka.clusterers.NumberOfClustersRequestable, weka.core.WeightedInstancesHandler, weka.core.TechnicalInformationHandler
SimpleKMeans
adapted for SAX.- Version:
- $Revision$
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static int
CANOPY
static int
FARTHEST_FIRST
static int
KMEANS_PLUS_PLUS
protected int[]
m_Assignments
Assignments obtained.protected weka.clusterers.Canopy
m_canopyClusters
The canopy clusterer (if being used)protected List<long[]>
m_centroidCanopyAssignments
Canopies that each centroid falls into (determined by T1 radius)protected weka.core.Instances
m_ClusterCentroids
holds the cluster centroids.protected int[][]
m_ClusterMissingCounts
protected int[][][]
m_ClusterNominalCounts
For each cluster, holds the frequency counts for the values of each nominal attribute.protected int[]
m_ClusterSizes
The number of instances in each cluster.protected weka.core.Instances
m_ClusterStdDevs
Holds the standard deviations of the numeric attributes in each cluster.protected int
m_completed
protected List<long[]>
m_dataPointCanopyAssignments
Canopies that each training instance falls into (determined by T1 radius)protected boolean
m_displayStdDevs
Display standard deviations for numeric atts.protected weka.core.DistanceFunction
m_DistanceFunction
the distance function used.protected boolean
m_dontReplaceMissing
Replace missing values globally?protected int
m_executionSlots
Number of threads to runprotected ExecutorService
m_executorPool
For parallel execution modeprotected int
m_failed
protected boolean
m_FastDistanceCalc
whether to use fast calculation of distances (using a cut-off).protected double[]
m_FullMeansOrMediansOrModes
Stats on the full data set for comparison purposes.protected int[]
m_FullMissingCounts
protected int[][]
m_FullNominalCounts
protected double[]
m_FullStdDevs
protected int
m_initializationMethod
The initialization method to useprotected weka.core.Instances
m_initialStartPoints
Holds the initial start points, as supplied by the initialization method usedprotected int
m_Iterations
Keep track of the number of iterations completed before convergence.protected int
m_maxCanopyCandidates
The maximum number of candidate canopies to hold in memory at any one time (if using canopy clustering)protected int
m_MaxIterations
Maximum number of iterations to be executed.protected double
m_minClusterDensity
The minimum cluster density (according to T2 distance) allowed.protected int
m_NumClusters
number of clusters to generate.protected int
m_periodicPruningRate
Prune low-density candidate canopies after every x instances have been seen (if using canopy clustering)protected boolean
m_PreserveOrder
Preserve order of instances.protected weka.filters.unsupervised.attribute.ReplaceMissingValues
m_ReplaceMissingFilter
replace missing values in training instances.protected boolean
m_speedUpDistanceCompWithCanopies
Whether to reducet the number of distance calcs done by k-means with canopiesprotected double[]
m_squaredErrors
Holds the squared errors for all clusters.protected double
m_t1
The t1 radius to pass through to Canopyprotected double
m_t2
The t2 radius to pass through to Canopystatic int
RANDOM
static weka.core.Tag[]
TAGS_SELECTION
Initialization methods
-
Constructor Summary
Constructors Constructor Description SAXKMeans()
the default constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
buildClusterer(weka.core.Instances data)
Generates a clusterer.protected void
canopyInit(weka.core.Instances data)
Initialize with the canopy centers of the Canopy clustering methodString
canopyMaxNumCanopiesToHoldInMemoryTipText()
Returns the tip text for this property.String
canopyMinimumCanopyDensityTipText()
Returns the tip text for this property.String
canopyPeriodicPruningRateTipText()
Returns the tip text for this property.String
canopyT1TipText()
Tip text for this propertyString
canopyT2TipText()
Tip text for this propertyint
clusterInstance(weka.core.Instance instance)
Classifies a given instance.String
displayStdDevsTipText()
Returns the tip text for this property.String
distanceFunctionTipText()
Returns the tip text for this property.String
dontReplaceMissingValuesTipText()
Returns the tip text for this property.protected void
farthestFirstInit(weka.core.Instances data)
Initialize with the fartherst first centersString
fastDistanceCalcTipText()
Returns the tip text for this property.int[]
getAssignments()
Gets the assignments for each instance.int
getCanopyMaxNumCanopiesToHoldInMemory()
Get the maximum number of candidate canopies to retain in memory during training.double
getCanopyMinimumCanopyDensity()
Get the minimum T2-based density below which a canopy will be pruned during periodic pruning.int
getCanopyPeriodicPruningRate()
Get the how often to prune low density canopies during training (if using canopy clustering)double
getCanopyT1()
Get the t1 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcsdouble
getCanopyT2()
Get the t2 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcsweka.core.Capabilities
getCapabilities()
Returns default capabilities of the clusterer.weka.core.Instances
getClusterCentroids()
Gets the cluster centroids.int[][][]
getClusterNominalCounts()
Returns for each cluster the frequency counts for the values of each nominal attribute.int[]
getClusterSizes()
Gets the number of instances in each cluster.weka.core.Instances
getClusterStandardDevs()
Gets the standard deviations of the numeric attributes in each cluster.boolean
getDisplayStdDevs()
Gets whether standard deviations and nominal count.weka.core.DistanceFunction
getDistanceFunction()
returns the distance function currently in use.boolean
getDontReplaceMissingValues()
Gets whether missing values are to be replaced.boolean
getFastDistanceCalc()
Gets whether to use faster distance calculation.weka.core.SelectedTag
getInitializationMethod()
Get the initialization method to useint
getMaxIterations()
gets the number of maximum iterations to be executed.int
getNumClusters()
gets the number of clusters to generate.int
getNumExecutionSlots()
Get the degree of parallelism to use.String[]
getOptions()
Gets the current settings of SimpleKMeans.boolean
getPreserveInstancesOrder()
Gets whether order of instances must be preserved.boolean
getReduceNumberOfDistanceCalcsViaCanopies()
Get whether to use canopies to reduce the number of distance computations requiredString
getRevision()
Returns the revision string.double
getSquaredError()
Gets the squared error for all clusters.weka.core.TechnicalInformation
getTechnicalInformation()
String
globalInfo()
Returns a string describing this clusterer.String
initializationMethodTipText()
Returns the tip text for this property.protected void
kMeansPlusPlusInit(weka.core.Instances data)
Initialize using the k-means++ methodprotected boolean
launchAssignToClusters(weka.core.Instances insts, int[] clusterAssignments)
Launch the tasks that assign instances to clustersprotected int
launchMoveCentroids(weka.core.Instances[] clusters)
Launch the move centroids tasksEnumeration<weka.core.Option>
listOptions()
Returns an enumeration describing the available options.static void
main(String[] args)
Main method for executing this class.String
maxIterationsTipText()
Returns the tip text for this property.protected double[]
moveCentroid(int centroidIndex, weka.core.Instances members, boolean updateClusterInfo, boolean addToCentroidInstances)
Move the centroid to it's new coordinates.int
numberOfClusters()
Returns the number of clusters.String
numClustersTipText()
Returns the tip text for this property.String
numExecutionSlotsTipText()
Returns the tip text for this propertyString
preserveInstancesOrderTipText()
Returns the tip text for this property.String
reduceNumberOfDistanceCalcsViaCanopiesTipText()
Returns the tip text for this property.void
setCanopyMaxNumCanopiesToHoldInMemory(int max)
Set the maximum number of candidate canopies to retain in memory during training.void
setCanopyMinimumCanopyDensity(double dens)
Set the minimum T2-based density below which a canopy will be pruned during periodic pruning.void
setCanopyPeriodicPruningRate(int p)
Set the how often to prune low density canopies during training (if using canopy clustering)void
setCanopyT1(double t1)
Set the t1 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcsvoid
setCanopyT2(double t2)
Set the t2 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcsvoid
setDisplayStdDevs(boolean stdD)
Sets whether standard deviations and nominal count.void
setDistanceFunction(weka.core.DistanceFunction df)
sets the distance function to use for instance comparison.void
setDontReplaceMissingValues(boolean r)
Sets whether missing values are to be replaced.void
setFastDistanceCalc(boolean value)
Sets whether to use faster distance calculation.void
setInitializationMethod(weka.core.SelectedTag method)
Set the initialization method to usevoid
setMaxIterations(int n)
set the maximum number of iterations to be executed.void
setNumClusters(int n)
set the number of clusters to generate.void
setNumExecutionSlots(int slots)
Set the degree of parallelism to use.void
setOptions(String[] options)
Parses a given list of options.void
setPreserveInstancesOrder(boolean r)
Sets whether order of instances must be preserved.void
setReduceNumberOfDistanceCalcsViaCanopies(boolean c)
Set whether to use canopies to reduce the number of distance computations requiredprotected void
startExecutorPool()
Start the pool of execution threadsString
toString()
return a string describing this clusterer.
-
-
-
Field Detail
-
m_ReplaceMissingFilter
protected weka.filters.unsupervised.attribute.ReplaceMissingValues m_ReplaceMissingFilter
replace missing values in training instances.
-
m_NumClusters
protected int m_NumClusters
number of clusters to generate.
-
m_initialStartPoints
protected weka.core.Instances m_initialStartPoints
Holds the initial start points, as supplied by the initialization method used
-
m_ClusterCentroids
protected weka.core.Instances m_ClusterCentroids
holds the cluster centroids.
-
m_ClusterStdDevs
protected weka.core.Instances m_ClusterStdDevs
Holds the standard deviations of the numeric attributes in each cluster.
-
m_ClusterNominalCounts
protected int[][][] m_ClusterNominalCounts
For each cluster, holds the frequency counts for the values of each nominal attribute.
-
m_ClusterMissingCounts
protected int[][] m_ClusterMissingCounts
-
m_FullMeansOrMediansOrModes
protected double[] m_FullMeansOrMediansOrModes
Stats on the full data set for comparison purposes. In case the attribute is numeric the value is the mean if is being used the Euclidian distance or the median if Manhattan distance and if the attribute is nominal then it's mode is saved.
-
m_FullStdDevs
protected double[] m_FullStdDevs
-
m_FullNominalCounts
protected int[][] m_FullNominalCounts
-
m_FullMissingCounts
protected int[] m_FullMissingCounts
-
m_displayStdDevs
protected boolean m_displayStdDevs
Display standard deviations for numeric atts.
-
m_dontReplaceMissing
protected boolean m_dontReplaceMissing
Replace missing values globally?
-
m_ClusterSizes
protected int[] m_ClusterSizes
The number of instances in each cluster.
-
m_MaxIterations
protected int m_MaxIterations
Maximum number of iterations to be executed.
-
m_Iterations
protected int m_Iterations
Keep track of the number of iterations completed before convergence.
-
m_squaredErrors
protected double[] m_squaredErrors
Holds the squared errors for all clusters.
-
m_DistanceFunction
protected weka.core.DistanceFunction m_DistanceFunction
the distance function used.
-
m_PreserveOrder
protected boolean m_PreserveOrder
Preserve order of instances.
-
m_Assignments
protected int[] m_Assignments
Assignments obtained.
-
m_FastDistanceCalc
protected boolean m_FastDistanceCalc
whether to use fast calculation of distances (using a cut-off).
-
RANDOM
public static final int RANDOM
- See Also:
- Constant Field Values
-
KMEANS_PLUS_PLUS
public static final int KMEANS_PLUS_PLUS
- See Also:
- Constant Field Values
-
CANOPY
public static final int CANOPY
- See Also:
- Constant Field Values
-
FARTHEST_FIRST
public static final int FARTHEST_FIRST
- See Also:
- Constant Field Values
-
TAGS_SELECTION
public static final weka.core.Tag[] TAGS_SELECTION
Initialization methods
-
m_initializationMethod
protected int m_initializationMethod
The initialization method to use
-
m_speedUpDistanceCompWithCanopies
protected boolean m_speedUpDistanceCompWithCanopies
Whether to reducet the number of distance calcs done by k-means with canopies
-
m_centroidCanopyAssignments
protected List<long[]> m_centroidCanopyAssignments
Canopies that each centroid falls into (determined by T1 radius)
-
m_dataPointCanopyAssignments
protected List<long[]> m_dataPointCanopyAssignments
Canopies that each training instance falls into (determined by T1 radius)
-
m_canopyClusters
protected weka.clusterers.Canopy m_canopyClusters
The canopy clusterer (if being used)
-
m_maxCanopyCandidates
protected int m_maxCanopyCandidates
The maximum number of candidate canopies to hold in memory at any one time (if using canopy clustering)
-
m_periodicPruningRate
protected int m_periodicPruningRate
Prune low-density candidate canopies after every x instances have been seen (if using canopy clustering)
-
m_minClusterDensity
protected double m_minClusterDensity
The minimum cluster density (according to T2 distance) allowed. Used when periodically pruning candidate canopies (if using canopy clustering)
-
m_t2
protected double m_t2
The t2 radius to pass through to Canopy
-
m_t1
protected double m_t1
The t1 radius to pass through to Canopy
-
m_executionSlots
protected int m_executionSlots
Number of threads to run
-
m_executorPool
protected transient ExecutorService m_executorPool
For parallel execution mode
-
m_completed
protected int m_completed
-
m_failed
protected int m_failed
-
-
Method Detail
-
startExecutorPool
protected void startExecutorPool()
Start the pool of execution threads
-
getTechnicalInformation
public weka.core.TechnicalInformation getTechnicalInformation()
- Specified by:
getTechnicalInformation
in interfaceweka.core.TechnicalInformationHandler
-
globalInfo
public String globalInfo()
Returns a string describing this clusterer.- Returns:
- a description of the evaluator suitable for displaying in the explorer/experimenter gui
-
getCapabilities
public weka.core.Capabilities getCapabilities()
Returns default capabilities of the clusterer.- Specified by:
getCapabilities
in interfaceweka.core.CapabilitiesHandler
- Specified by:
getCapabilities
in interfaceweka.clusterers.Clusterer
- Overrides:
getCapabilities
in classweka.clusterers.AbstractClusterer
- Returns:
- the capabilities of this clusterer
-
launchMoveCentroids
protected int launchMoveCentroids(weka.core.Instances[] clusters)
Launch the move centroids tasks- Parameters:
clusters
- the cluster centroids- Returns:
- the number of empty clusters
-
launchAssignToClusters
protected boolean launchAssignToClusters(weka.core.Instances insts, int[] clusterAssignments) throws Exception
Launch the tasks that assign instances to clusters- Parameters:
insts
- the instances to be clusteredclusterAssignments
- the array of cluster assignments- Returns:
- true if k means has converged
- Throws:
Exception
- if a problem occurs
-
buildClusterer
public void buildClusterer(weka.core.Instances data) throws Exception
Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.- Specified by:
buildClusterer
in interfaceweka.clusterers.Clusterer
- Specified by:
buildClusterer
in classweka.clusterers.AbstractClusterer
- Parameters:
data
- set of instances serving as training data- Throws:
Exception
- if the clusterer has not been generated successfully
-
canopyInit
protected void canopyInit(weka.core.Instances data) throws Exception
Initialize with the canopy centers of the Canopy clustering method- Parameters:
data
- the training data- Throws:
Exception
- if a problem occurs
-
farthestFirstInit
protected void farthestFirstInit(weka.core.Instances data) throws Exception
Initialize with the fartherst first centers- Parameters:
data
- the training data- Throws:
Exception
- if a problem occurs
-
kMeansPlusPlusInit
protected void kMeansPlusPlusInit(weka.core.Instances data) throws Exception
Initialize using the k-means++ method- Parameters:
data
- the training data- Throws:
Exception
- if a problem occurs
-
moveCentroid
protected double[] moveCentroid(int centroidIndex, weka.core.Instances members, boolean updateClusterInfo, boolean addToCentroidInstances)
Move the centroid to it's new coordinates. Generate the centroid coordinates based on it's members (objects assigned to the cluster of the centroid) and the distance function being used.- Parameters:
centroidIndex
- index of the centroid which the coordinates will be computedmembers
- the objects that are assigned to the cluster of this centroidupdateClusterInfo
- if the method is supposed to update the m_Cluster arraysaddToCentroidInstances
- true if the method is to add the computed coordinates to the Instances holding the centroids- Returns:
- the centroid coordinates
-
clusterInstance
public int clusterInstance(weka.core.Instance instance) throws Exception
Classifies a given instance.- Specified by:
clusterInstance
in interfaceweka.clusterers.Clusterer
- Overrides:
clusterInstance
in classweka.clusterers.AbstractClusterer
- Parameters:
instance
- the instance to be assigned to a cluster- Returns:
- the number of the assigned cluster as an interger if the class is enumerated, otherwise the predicted value
- Throws:
Exception
- if instance could not be classified successfully
-
numberOfClusters
public int numberOfClusters() throws Exception
Returns the number of clusters.- Specified by:
numberOfClusters
in interfaceweka.clusterers.Clusterer
- Specified by:
numberOfClusters
in classweka.clusterers.AbstractClusterer
- Returns:
- the number of clusters generated for a training dataset.
- Throws:
Exception
- if number of clusters could not be returned successfully
-
listOptions
public Enumeration<weka.core.Option> listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.clusterers.RandomizableClusterer
- Returns:
- an enumeration of all the available options.
-
numClustersTipText
public String numClustersTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNumClusters
public void setNumClusters(int n) throws Exception
set the number of clusters to generate.- Specified by:
setNumClusters
in interfaceweka.clusterers.NumberOfClustersRequestable
- Parameters:
n
- the number of clusters to generate- Throws:
Exception
- if number of clusters is negative
-
getNumClusters
public int getNumClusters()
gets the number of clusters to generate.- Returns:
- the number of clusters to generate
-
initializationMethodTipText
public String initializationMethodTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setInitializationMethod
public void setInitializationMethod(weka.core.SelectedTag method)
Set the initialization method to use- Parameters:
method
- the initialization method to use
-
getInitializationMethod
public weka.core.SelectedTag getInitializationMethod()
Get the initialization method to use- Returns:
- method the initialization method to use
-
reduceNumberOfDistanceCalcsViaCanopiesTipText
public String reduceNumberOfDistanceCalcsViaCanopiesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setReduceNumberOfDistanceCalcsViaCanopies
public void setReduceNumberOfDistanceCalcsViaCanopies(boolean c)
Set whether to use canopies to reduce the number of distance computations required- Parameters:
c
- true if canopies are to be used to reduce the number of distance computations
-
getReduceNumberOfDistanceCalcsViaCanopies
public boolean getReduceNumberOfDistanceCalcsViaCanopies()
Get whether to use canopies to reduce the number of distance computations required- Returns:
- true if canopies are to be used to reduce the number of distance computations
-
canopyPeriodicPruningRateTipText
public String canopyPeriodicPruningRateTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCanopyPeriodicPruningRate
public void setCanopyPeriodicPruningRate(int p)
Set the how often to prune low density canopies during training (if using canopy clustering)- Parameters:
p
- how often (every p instances) to prune low density canopies
-
getCanopyPeriodicPruningRate
public int getCanopyPeriodicPruningRate()
Get the how often to prune low density canopies during training (if using canopy clustering)- Returns:
- how often (every p instances) to prune low density canopies
-
canopyMinimumCanopyDensityTipText
public String canopyMinimumCanopyDensityTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCanopyMinimumCanopyDensity
public void setCanopyMinimumCanopyDensity(double dens)
Set the minimum T2-based density below which a canopy will be pruned during periodic pruning.- Parameters:
dens
- the minimum canopy density
-
getCanopyMinimumCanopyDensity
public double getCanopyMinimumCanopyDensity()
Get the minimum T2-based density below which a canopy will be pruned during periodic pruning.- Returns:
- the minimum canopy density
-
canopyMaxNumCanopiesToHoldInMemoryTipText
public String canopyMaxNumCanopiesToHoldInMemoryTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCanopyMaxNumCanopiesToHoldInMemory
public void setCanopyMaxNumCanopiesToHoldInMemory(int max)
Set the maximum number of candidate canopies to retain in memory during training. T2 distance and data characteristics determine how many candidate canopies are formed before periodic and final pruning are performed. There may not be enough memory available if T2 is set too low.- Parameters:
max
- the maximum number of candidate canopies to retain in memory during training
-
getCanopyMaxNumCanopiesToHoldInMemory
public int getCanopyMaxNumCanopiesToHoldInMemory()
Get the maximum number of candidate canopies to retain in memory during training. T2 distance and data characteristics determine how many candidate canopies are formed before periodic and final pruning are performed. There may not be enough memory available if T2 is set too low.- Returns:
- the maximum number of candidate canopies to retain in memory during training
-
canopyT2TipText
public String canopyT2TipText()
Tip text for this property- Returns:
- the tip text for this property
-
setCanopyT2
public void setCanopyT2(double t2)
Set the t2 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcs- Parameters:
t2
- the t2 radius to use
-
getCanopyT2
public double getCanopyT2()
Get the t2 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcs- Returns:
- the t2 radius to use
-
canopyT1TipText
public String canopyT1TipText()
Tip text for this property- Returns:
- the tip text for this property
-
setCanopyT1
public void setCanopyT1(double t1)
Set the t1 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcs- Parameters:
t1
- the t1 radius to use
-
getCanopyT1
public double getCanopyT1()
Get the t1 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcs- Returns:
- the t1 radius to use
-
maxIterationsTipText
public String maxIterationsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMaxIterations
public void setMaxIterations(int n) throws Exception
set the maximum number of iterations to be executed.- Parameters:
n
- the maximum number of iterations- Throws:
Exception
- if maximum number of iteration is smaller than 1
-
getMaxIterations
public int getMaxIterations()
gets the number of maximum iterations to be executed.- Returns:
- the number of clusters to generate
-
displayStdDevsTipText
public String displayStdDevsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDisplayStdDevs
public void setDisplayStdDevs(boolean stdD)
Sets whether standard deviations and nominal count. Should be displayed in the clustering output.- Parameters:
stdD
- true if std. devs and counts should be displayed
-
getDisplayStdDevs
public boolean getDisplayStdDevs()
Gets whether standard deviations and nominal count. Should be displayed in the clustering output.- Returns:
- true if std. devs and counts should be displayed
-
dontReplaceMissingValuesTipText
public String dontReplaceMissingValuesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDontReplaceMissingValues
public void setDontReplaceMissingValues(boolean r)
Sets whether missing values are to be replaced.- Parameters:
r
- true if missing values are to be replaced
-
getDontReplaceMissingValues
public boolean getDontReplaceMissingValues()
Gets whether missing values are to be replaced.- Returns:
- true if missing values are to be replaced
-
distanceFunctionTipText
public String distanceFunctionTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getDistanceFunction
public weka.core.DistanceFunction getDistanceFunction()
returns the distance function currently in use.- Returns:
- the distance function
-
setDistanceFunction
public void setDistanceFunction(weka.core.DistanceFunction df) throws Exception
sets the distance function to use for instance comparison.- Parameters:
df
- the new distance function to use- Throws:
Exception
- if instances cannot be processed
-
preserveInstancesOrderTipText
public String preserveInstancesOrderTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setPreserveInstancesOrder
public void setPreserveInstancesOrder(boolean r)
Sets whether order of instances must be preserved.- Parameters:
r
- true if missing values are to be replaced
-
getPreserveInstancesOrder
public boolean getPreserveInstancesOrder()
Gets whether order of instances must be preserved.- Returns:
- true if missing values are to be replaced
-
fastDistanceCalcTipText
public String fastDistanceCalcTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setFastDistanceCalc
public void setFastDistanceCalc(boolean value)
Sets whether to use faster distance calculation.- Parameters:
value
- true if faster calculation to be used
-
getFastDistanceCalc
public boolean getFastDistanceCalc()
Gets whether to use faster distance calculation.- Returns:
- true if faster calculation is used
-
numExecutionSlotsTipText
public String numExecutionSlotsTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNumExecutionSlots
public void setNumExecutionSlots(int slots)
Set the degree of parallelism to use.- Parameters:
slots
- the number of tasks to run in parallel when computing the nearest neighbors and evaluating different values of k between the lower and upper bounds
-
getNumExecutionSlots
public int getNumExecutionSlots()
Get the degree of parallelism to use.- Returns:
- the number of tasks to run in parallel when computing the nearest neighbors and evaluating different values of k between the lower and upper bounds
-
setOptions
public void setOptions(String[] options) throws Exception
Parses a given list of options.
Valid options are:
-N <num> Number of clusters. (default 2).
-init Initialization method to use. 0 = random, 1 = k-means++, 2 = canopy, 3 = farthest first. (default = 0)
-C Use canopies to reduce the number of distance calculations.
-max-candidates <num> Maximum number of candidate canopies to retain in memory at any one time when using canopy clustering. T2 distance plus, data characteristics, will determine how many candidate canopies are formed before periodic and final pruning are performed, which might result in exceess memory consumption. This setting avoids large numbers of candidate canopies consuming memory. (default = 100)
-periodic-pruning <num> How often to prune low density canopies when using canopy clustering. (default = every 10,000 training instances)
-min-density Minimum canopy density, when using canopy clustering, below which a canopy will be pruned during periodic pruning. (default = 2 instances)
-t2 The T2 distance to use when using canopy clustering. Values < 0 indicate that a heuristic based on attribute std. deviation should be used to set this. (default = -1.0)
-t1 The T1 distance to use when using canopy clustering. A value < 0 is taken as a positive multiplier for T2. (default = -1.5)
-V Display std. deviations for centroids.
-M Don't replace missing values with mean/mode.
-A <classname and options> Distance function to use. (default: weka.core.SAXDistance)
-I <num> Maximum number of iterations.
-O Preserve order of instances.
-fast Enables faster distance calculations, using cut-off values. Disables the calculation/output of squared errors/distances.
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism)
-S <num> Random number seed. (default 10)
-output-debug-info If set, clusterer is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, clusterer capabilities are not checked before clusterer is built (use with caution).
- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.clusterers.RandomizableClusterer
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current settings of SimpleKMeans.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.clusterers.RandomizableClusterer
- Returns:
- an array of strings suitable for passing to setOptions()
-
toString
public String toString()
return a string describing this clusterer.
-
getClusterCentroids
public weka.core.Instances getClusterCentroids()
Gets the cluster centroids.- Returns:
- the cluster centroids
-
getClusterStandardDevs
public weka.core.Instances getClusterStandardDevs()
Gets the standard deviations of the numeric attributes in each cluster.- Returns:
- the standard deviations of the numeric attributes in each cluster
-
getClusterNominalCounts
public int[][][] getClusterNominalCounts()
Returns for each cluster the frequency counts for the values of each nominal attribute.- Returns:
- the counts
-
getSquaredError
public double getSquaredError()
Gets the squared error for all clusters.- Returns:
- the squared error, NaN if fast distance calculation is used
- See Also:
m_FastDistanceCalc
-
getClusterSizes
public int[] getClusterSizes()
Gets the number of instances in each cluster.- Returns:
- The number of instances in each cluster
-
getAssignments
public int[] getAssignments() throws Exception
Gets the assignments for each instance.- Returns:
- Array of indexes of the centroid assigned to each instance
- Throws:
Exception
- if order of instances wasn't preserved or no assignments were made
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceweka.core.RevisionHandler
- Overrides:
getRevision
in classweka.clusterers.AbstractClusterer
- Returns:
- the revision
-
main
public static void main(String[] args)
Main method for executing this class.- Parameters:
args
- use -h to list all parameters
-
-