| Modifier and Type | Class and Description |
|---|---|
class |
AbstractAssociator
Abstract scheme for learning associations.
|
class |
Apriori
Class implementing an Apriori-type algorithm.
|
class |
FilteredAssociator
Class for running an arbitrary associator on data
that has been passed through an arbitrary filter.
|
class |
FPGrowth
Class implementing the FP-growth algorithm for
finding large item sets without candidate generation.
|
class |
SingleAssociatorEnhancer
Abstract utility class for handling settings common to meta associators that
use a single base associator.
|
| Modifier and Type | Class and Description |
|---|---|
class |
ASEvaluation
Abstract attribute selection evaluation class
|
class |
AttributeSetEvaluator
Abstract attribute set evaluator.
|
class |
CfsSubsetEval
CfsSubsetEval :
Evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them. Subsets of features that are highly correlated with the class while having low intercorrelation are preferred. For more information see: M. |
class |
CorrelationAttributeEval
CorrelationAttributeEval :
Evaluates the worth of an attribute by measuring the correlation (Pearson's) between it and the class. Nominal attributes are considered on a value by value basis by treating each value as an indicator. |
class |
GainRatioAttributeEval
GainRatioAttributeEval :
Evaluates the worth of an attribute by measuring the gain ratio with respect to the class. GainR(Class, Attribute) = (H(Class) - H(Class | Attribute)) / H(Attribute). Valid options are: |
class |
HoldOutSubsetEvaluator
Abstract attribute subset evaluator capable of evaluating subsets with
respect to a data set that is distinct from that used to initialize/
train the subset evaluator.
|
class |
InfoGainAttributeEval
InfoGainAttributeEval :
Evaluates the worth of an attribute by measuring the information gain with respect to the class. InfoGain(Class,Attribute) = H(Class) - H(Class | Attribute). Valid options are: |
class |
OneRAttributeEval
OneRAttributeEval :
Evaluates the worth of an attribute by using the OneR classifier. Valid options are: |
class |
ReliefFAttributeEval
ReliefFAttributeEval :
Evaluates the worth of an attribute by repeatedly sampling an instance and considering the value of the given attribute for the nearest instance of the same and different class. |
class |
SymmetricalUncertAttributeEval
SymmetricalUncertAttributeEval :
Evaluates the worth of an attribute by measuring the symmetrical uncertainty with respect to the class. |
class |
UnsupervisedAttributeEvaluator
Abstract unsupervised attribute evaluator.
|
class |
UnsupervisedSubsetEvaluator
Abstract unsupervised attribute subset evaluator.
|
class |
WrapperSubsetEval
WrapperSubsetEval:
Evaluates attribute sets by using a learning scheme. |
| Modifier and Type | Class and Description |
|---|---|
class |
AbstractClassifier
Abstract classifier.
|
class |
IteratedSingleClassifierEnhancer
Abstract utility class for handling settings common to
meta classifiers that build an ensemble from a single base learner.
|
class |
MultipleClassifiersCombiner
Abstract utility class for handling settings common to
meta classifiers that build an ensemble from multiple classifiers.
|
class |
ParallelIteratedSingleClassifierEnhancer
Abstract utility class for handling settings common to
meta classifiers that build an ensemble in parallel from a single
base learner.
|
class |
ParallelMultipleClassifiersCombiner
Abstract utility class for handling settings common to
meta classifiers that build an ensemble in parallel using multiple
classifiers.
|
class |
RandomizableClassifier
Abstract utility class for handling settings common to randomizable
classifiers.
|
class |
RandomizableIteratedSingleClassifierEnhancer
Abstract utility class for handling settings common to randomizable
meta classifiers that build an ensemble from a single base learner.
|
class |
RandomizableMultipleClassifiersCombiner
Abstract utility class for handling settings common to randomizable
meta classifiers that build an ensemble from multiple classifiers based
on a given random number seed.
|
class |
RandomizableParallelIteratedSingleClassifierEnhancer
Abstract utility class for handling settings common to randomizable
meta classifiers that build an ensemble in parallel from a single base
learner.
|
class |
RandomizableParallelMultipleClassifiersCombiner
Abstract utility class for handling settings common to
meta classifiers that build an ensemble in parallel using multiple
classifiers based on a given random number seed.
|
class |
RandomizableSingleClassifierEnhancer
Abstract utility class for handling settings common to randomizable
meta classifiers that build an ensemble from a single base learner.
|
class |
SingleClassifierEnhancer
Abstract utility class for handling settings common to meta
classifiers that use a single base learner.
|
| Modifier and Type | Class and Description |
|---|---|
class |
BayesNet
Bayes Network learning using various search
algorithms and quality measures.
Base class for a Bayes Network classifier. |
class |
NaiveBayes
Class for a Naive Bayes classifier using estimator
classes.
|
class |
NaiveBayesMultinomial
Class for building and using a multinomial Naive Bayes classifier.
|
class |
NaiveBayesMultinomialText
Multinomial naive bayes for text data.
|
class |
NaiveBayesMultinomialUpdateable
Class for building and using a multinomial Naive Bayes classifier.
|
class |
NaiveBayesUpdateable
Class for a Naive Bayes classifier using estimator classes.
|
| Modifier and Type | Class and Description |
|---|---|
class |
BayesNetGenerator
Bayes Network learning using various search
algorithms and quality measures.
Base class for a Bayes Network classifier. |
class |
BIFReader
Builds a description of a Bayes Net classifier
stored in XML BIF 0.3 format.
For more details on XML BIF see: Fabio Cozman, Marek Druzdzel, Daniel Garcia (1998). |
class |
EditableBayesNet
Bayes Network learning using various search
algorithms and quality measures.
Base class for a Bayes Network classifier. |
| Modifier and Type | Class and Description |
|---|---|
class |
DiscreteEstimatorBayes
Symbolic probability estimator based on symbol counts and a prior.
|
class |
DiscreteEstimatorFullBayes
Symbolic probability estimator based on symbol counts and a prior.
|
| Modifier and Type | Class and Description |
|---|---|
class |
GaussianProcesses
Implements Gaussian processes for regression
without hyperparameter-tuning.
|
class |
LinearRegression
Class for using linear regression for prediction.
|
class |
Logistic
Class for building and using a multinomial logistic regression model with a ridge estimator.
There are some modifications, however, compared to the paper of leCessie and van Houwelingen(1992): If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix. The probability for class j with the exception of the last class is Pj(Xi) = exp(XiBj)/((sum[j=1..(k-1)]exp(Xi*Bj))+1) The last class has probability 1-(sum[j=1..(k-1)]Pj(Xi)) = 1/((sum[j=1..(k-1)]exp(Xi*Bj))+1) The (negative) multinomial log-likelihood is thus: L = -sum[i=1..n]{ sum[j=1..(k-1)](Yij * ln(Pj(Xi))) +(1 - (sum[j=1..(k-1)]Yij)) * ln(1 - sum[j=1..(k-1)]Pj(Xi)) } + ridge * (B^2) In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. |
class |
MultilayerPerceptron
A Classifier that uses backpropagation to classify
instances.
This network can be built by hand, created by an algorithm or both. |
class |
SGD
Implements stochastic gradient descent for learning various linear models (binary class SVM, binary class logistic regression, squared loss, Huber loss and epsilon-insensitive loss linear regression).
|
class |
SGDText
Implements stochastic gradient descent for learning
a linear binary class SVM or binary class logistic regression on text data.
|
class |
SimpleLinearRegression
Learns a simple linear regression model.
|
class |
SimpleLogistic
Classifier for building linear logistic regression models.
|
class |
SMO
Implements John Platt's sequential minimal optimization algorithm for training a support vector classifier.
This implementation globally replaces all missing values and transforms nominal attributes into binary ones. |
class |
SMOreg
SMOreg implements the support vector machine for regression.
|
class |
VotedPerceptron
Implementation of the voted perceptron algorithm by Freund and Schapire.
|
| Modifier and Type | Class and Description |
|---|---|
class |
CachedKernel
Base class for RBFKernel and PolyKernel that implements a simple LRU.
|
class |
Kernel
Abstract kernel.
|
class |
NormalizedPolyKernel
The normalized polynomial kernel.
K(x,y) = <x,y>/sqrt(<x,x><y,y>) where <x,y> = PolyKernel(x,y) Valid options are: |
class |
PolyKernel
The polynomial kernel : K(x, y) = <x, y>^p or
K(x, y) = (<x, y>+1)^p
Valid options are:
|
class |
PrecomputedKernelMatrixKernel
This kernel is based on a static kernel matrix that
is read from a file.
|
class |
Puk
The Pearson VII function-based universal kernel.
For more information see: B. |
class |
RBFKernel
The RBF kernel.
|
class |
StringKernel
Implementation of the subsequence kernel (SSK) as
described in [1] and of the subsequence kernel with lambda pruning (SSK-LP)
as described in [2].
For more information, see Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, Christopher J. |
| Modifier and Type | Class and Description |
|---|---|
class |
IBk
K-nearest neighbours classifier.
|
class |
KStar
K* is an instance-based classifier, that is the class of a test instance is based upon the class of those training instances similar to it, as determined by some similarity function.
|
class |
LWL
Locally weighted learning.
|
| Modifier and Type | Class and Description |
|---|---|
class |
AdaBoostM1
Class for boosting a nominal class classifier using
the Adaboost M1 method.
|
class |
AdditiveRegression
Meta classifier that enhances the performance of a regression base classifier.
|
class |
AttributeSelectedClassifier
Dimensionality of training and test data is reduced by attribute selection before being passed on to a classifier.
|
class |
Bagging
Class for bagging a classifier to reduce variance.
|
class |
ClassificationViaRegression
Class for doing classification using regression methods.
|
class |
CostSensitiveClassifier
A metaclassifier that makes its base classifier cost-sensitive.
|
class |
CVParameterSelection
Class for performing parameter selection by cross-validation for any classifier.
For more information, see: R. |
class |
FilteredClassifier
Class for running an arbitrary classifier on data that has been passed through an arbitrary filter.
|
class |
LogitBoost
Class for performing additive logistic regression.
|
class |
MultiClassClassifier
A metaclassifier for handling multi-class datasets with 2-class classifiers.
|
class |
MultiClassClassifierUpdateable
A metaclassifier for handling multi-class datasets with 2-class classifiers.
|
class |
MultiScheme
Class for selecting a classifier from among several using cross validation on the training data or the performance on the training data.
|
class |
RandomCommittee
Class for building an ensemble of randomizable base classifiers.
|
class |
RandomizableFilteredClassifier
Class for running an arbitrary classifier on data that has been passed through an arbitrary filter.
|
class |
RandomSubSpace
This method constructs a decision tree based classifier that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.
|
class |
RegressionByDiscretization
A regression scheme that employs any classifier on a copy of the data that has the class attribute (equal-width) discretized.
|
class |
Stacking
Combines several classifiers using the stacking method.
|
class |
Vote
Class for combining classifiers.
|
| Modifier and Type | Class and Description |
|---|---|
class |
InputMappedClassifier
Wrapper classifier that addresses incompatible
training and test data by building a mapping between the training data that a
classifier has been built with and the incoming test instances' structure.
|
class |
SerializedClassifier
A wrapper around a serialized classifier model.
|
| Modifier and Type | Class and Description |
|---|---|
class |
GeneralRegression
Class implementing import of PMML General Regression model.
|
class |
NeuralNetwork
Class implementing import of PMML Neural Network model.
|
class |
PMMLClassifier
Abstract base class for all PMML classifiers.
|
class |
Regression
Class implementing import of PMML Regression model.
|
class |
RuleSetModel
Class implementing import of PMML RuleSetModel.
|
class |
SupportVectorMachineModel
Implements a PMML SupportVectorMachineModel
|
class |
TreeModel
Class implementing import of PMML TreeModel.
|
| Modifier and Type | Class and Description |
|---|---|
class |
DecisionTable
Class for building and using a simple decision
table majority classifier.
For more information see: Ron Kohavi: The Power of Decision Tables. |
class |
JRip
This class implements a propositional rule learner,
Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was
proposed by William W.
|
class |
M5Rules
Generates a decision list for regression problems using separate-and-conquer.
|
class |
OneR
Class for building and using a 1R classifier; in
other words, uses the minimum-error attribute for prediction, discretizing
numeric attributes.
|
class |
PART
Class for generating a PART decision list.
|
class |
ZeroR
Class for building and using a 0-R classifier.
|
| Modifier and Type | Class and Description |
|---|---|
class |
MakeDecList
Class for handling a decision list.
|
| Modifier and Type | Class and Description |
|---|---|
class |
DecisionStump
Class for building and using a decision stump.
|
class |
HoeffdingTree
A Hoeffding tree (VFDT) is an incremental, anytime
decision tree induction algorithm that is capable of learning from massive
data streams, assuming that the distribution generating examples does not
change over time.
|
class |
J48
Class for generating a pruned or unpruned C4.5
decision tree.
|
class |
LMT
Classifier for building 'logistic model trees',
which are classification trees with logistic regression functions at the
leaves.
|
class |
M5P
M5Base.
|
class |
RandomForest
Class for constructing a forest of random trees.
For more information see: Leo Breiman (2001). |
class |
RandomTree
Class for constructing a tree that considers K randomly chosen attributes at each node.
|
class |
REPTree
Fast decision tree learner.
|
| Modifier and Type | Class and Description |
|---|---|
class |
C45PruneableClassifierTree
Class for handling a tree structure that can
be pruned using C4.5 procedures.
|
class |
ClassifierTree
Class for handling a tree structure used for classification.
|
class |
NBTreeClassifierTree
Class for handling a naive bayes tree structure used for classification.
|
class |
PruneableClassifierTree
Class for handling a tree structure that can
be pruned using a pruning set.
|
| Modifier and Type | Class and Description |
|---|---|
class |
LMTNode
Class for logistic model tree structure.
|
class |
LogisticBase
Base/helper class for building logistic regression models with the LogitBoost
algorithm.
|
| Modifier and Type | Class and Description |
|---|---|
class |
M5Base
M5Base.
|
class |
PreConstructedLinearModel
This class encapsulates a linear regression function.
|
class |
RuleNode
Constructs a node for use in an m5 tree or rule
|
| Modifier and Type | Class and Description |
|---|---|
class |
AbstractClusterer
Abstract clusterer.
|
class |
AbstractDensityBasedClusterer
Abstract clustering model that produces (for each test instance)
an estimate of the membership in each cluster
(ie.
|
class |
Canopy
Cluster data using the capopy clustering algorithm,
which requires just one pass over the data.
|
class |
Cobweb
Class implementing the Cobweb and Classit
clustering algorithms.
Note: the application of node operators (merging, splitting etc.) in terms of ordering and priority differs (and is somewhat ambiguous) between the original Cobweb and Classit papers. |
class |
EM
Simple EM (expectation maximisation) class.
EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. |
class |
FarthestFirst
Cluster data using the FarthestFirst algorithm.
For more information see: Hochbaum, Shmoys (1985). |
class |
FilteredClusterer
Class for running an arbitrary clusterer on data
that has been passed through an arbitrary filter.
|
class |
HierarchicalClusterer
Hierarchical clustering class.
|
class |
MakeDensityBasedClusterer
Class for wrapping a Clusterer to make it return a
distribution and density.
|
class |
RandomizableClusterer
Abstract utility class for handling settings common to randomizable
clusterers.
|
class |
RandomizableDensityBasedClusterer
Abstract utility class for handling settings common to randomizable
clusterers.
|
class |
RandomizableSingleClustererEnhancer
Abstract utility class for handling settings common to randomizable
clusterers.
|
class |
SimpleKMeans
Cluster data using the k means algorithm.
|
class |
SingleClustererEnhancer
Meta-clusterer for enhancing a base clusterer.
|
| Modifier and Type | Interface and Description |
|---|---|
interface |
MultiInstanceCapabilitiesHandler
Multi-Instance classifiers can specify an additional Capabilities object
for the data in the relational attribute, since the format of multi-instance
data is fixed to "bag/NOMINAL,data/RELATIONAL,class".
|
interface |
PartitionGenerator
This interface can be implemented by algorithms that generate
a partition of the instance space (e.g., decision trees).
|
| Modifier and Type | Class and Description |
|---|---|
class |
FindWithCapabilities
Locates all classes with certain capabilities.
|
| Modifier and Type | Method and Description |
|---|---|
CapabilitiesHandler |
TestInstances.getHandler()
returns the current set CapabilitiesHandler to generate the dataset for,
can be null
|
CapabilitiesHandler |
FindWithCapabilities.getHandler()
returns the current set CapabilitiesHandler to generate the dataset for,
can be null.
|
CapabilitiesHandler |
Capabilities.getOwner()
returns the owner of this capabilities object
|
| Modifier and Type | Method and Description |
|---|---|
void |
TestInstances.setHandler(CapabilitiesHandler value)
sets the Capabilities handler to generate the data for
|
void |
FindWithCapabilities.setHandler(CapabilitiesHandler value)
sets the Capabilities handler to generate the data for.
|
void |
Capabilities.setOwner(CapabilitiesHandler value)
sets the owner of this capabilities object
|
| Constructor and Description |
|---|
Capabilities(CapabilitiesHandler owner)
initializes the capabilities for the given owner
|
| Modifier and Type | Class and Description |
|---|---|
class |
AbstractFileSaver
Abstract class for Savers that save to a file
Valid options are:
-i input arff file
The input filw in arff format. |
class |
AbstractSaver
Abstract class for Saver
|
class |
ArffSaver
Writes to a destination in arff text format.
|
class |
C45Saver
Writes to a destination that is in the format used
by the C4.5 algorithm.
Therefore it outputs a names and a data file. |
class |
CSVSaver
Writes to a destination that is in CSV
(comma-separated values) format.
|
class |
DatabaseSaver
Writes to a database (tested with MySQL, InstantDB,
HSQLDB).
|
class |
JSONSaver
Writes to a destination that is in JSON format.
The data can be compressed with gzip, in order to save space. For more information, see JSON homepage: http://www.json.org/ Valid options are: |
class |
LibSVMSaver
Writes to a destination that is in libsvm format.
For more information about libsvm see: http://www.csie.ntu.edu.tw/~cjlin/libsvm/ Valid options are: |
class |
MatlabSaver
Writes Matlab ASCII files, in single or double
precision format.
|
class |
SerializedInstancesSaver
Serializes the instances to a file with extension bsi.
|
class |
SVMLightSaver
Writes to a destination that is in svm light
format.
For more information about svm light see: http://svmlight.joachims.org/ Valid options are: |
class |
XRFFSaver
Writes to a destination that is in the XML version
of the ARFF format.
|
| Modifier and Type | Class and Description |
|---|---|
class |
DiscreteEstimator
Simple symbolic probability estimator based on symbol counts.
|
class |
Estimator
Abstract class for all estimators.
|
class |
KernelEstimator
Simple kernel density estimator.
|
class |
MahalanobisEstimator
Simple probability estimator that places a single normal distribution
over the observed values.
|
class |
NormalEstimator
Simple probability estimator that places a single normal distribution over
the observed values.
|
class |
PoissonEstimator
Simple probability estimator that places a single Poisson distribution
over the observed values.
|
| Modifier and Type | Class and Description |
|---|---|
class |
AllFilter
A simple instance filter that passes all instances directly
through.
|
class |
Filter
An abstract class for instance filters: objects that take instances as input,
carry out some transformation on the instance and then output the instance.
|
class |
MultiFilter
Applies several filters successively.
|
class |
SimpleBatchFilter
This filter is a superclass for simple batch filters.
|
class |
SimpleFilter
This filter contains common behavior of the SimpleBatchFilter and the
SimpleStreamFilter.
|
class |
SimpleStreamFilter
This filter is a superclass for simple stream filters.
|
| Modifier and Type | Class and Description |
|---|---|
class |
AddClassification
A filter for adding the classification, the class
distribution and an error flag to a dataset with a classifier.
|
class |
AttributeSelection
A supervised attribute filter that can be used to
select attributes.
|
class |
ClassOrder
Changes the order of the classes so that the class
values are no longer of in the order specified in the header.
|
class |
MergeNominalValues
Merges values of all nominal attributes among the
specified attributes, excluding the class attribute, using the CHAID method,
but without considering to re-split merged subsets.
|
class |
PartitionMembership
A filter that uses a PartitionGenerator to generate
partition membership values; filtered instances are composed of these values
plus the class attribute (if set in the input data) and rendered as sparse
instances.
|
| Modifier and Type | Class and Description |
|---|---|
class |
SpreadSubsample
Produces a random subsample of a dataset.
|
class |
StratifiedRemoveFolds
This filter takes a dataset and outputs a specified
fold for cross validation.
|
| Modifier and Type | Class and Description |
|---|---|
class |
AbstractTimeSeries
An abstract instance filter that assumes instances form time-series data and
performs some merging of attribute values in the current instance with
attribute attribute values of some previous (or future) instance.
|
class |
Add
An instance filter that adds a new attribute to the
dataset.
|
class |
AddCluster
A filter that adds a new nominal attribute
representing the cluster assigned to each instance by the specified
clustering algorithm.
Either the clustering algorithm gets built with the first batch of data or one specifies are serialized clusterer model file to use instead. |
class |
AddExpression
An instance filter that creates a new attribute by
applying a mathematical expression to existing attributes.
|
class |
AddID
An instance filter that adds an ID attribute to the
dataset.
|
class |
AddNoise
An instance filter that changes a percentage of a
given attributes values.
|
class |
AddUserFields
A filter that adds new attributes with user
specified type and constant value.
|
class |
AddValues
Adds the labels from the given list to an attribute
if they are missing.
|
class |
Center
Centers all numeric attributes in the given dataset to have zero mean (apart from the class attribute, if set).
|
class |
ChangeDateFormat
Changes the date format used by a date attribute.
|
class |
ClassAssigner
Filter that can set and unset the class index.
|
class |
ClusterMembership
A filter that uses a density-based clusterer to
generate cluster membership values; filtered instances are composed of these
values plus the class attribute (if set in the input data).
|
class |
Copy
An instance filter that copies a range of
attributes in the dataset.
|
class |
Discretize
An instance filter that discretizes a range of
numeric attributes in the dataset into nominal attributes.
|
class |
FirstOrder
This instance filter takes a range of N numeric
attributes and replaces them with N-1 numeric attributes, the values of which
are the difference between consecutive attribute values from the original
instance.
|
class |
InterquartileRange
A filter for detecting outliers and extreme values
based on interquartile ranges.
|
class |
KernelFilter
Converts the given set of predictor variables into
a kernel matrix.
|
class |
MakeIndicator
A filter that creates a new dataset with a boolean
attribute replacing a nominal attribute.
|
class |
MathExpression
Modify numeric attributes according to a given
expression
Valid options are:
|
class |
MergeInfrequentNominalValues
Merges all values of the specified nominal
attribute that are sufficiently infrequent.
|
class |
MergeManyValues
Merges many values of a nominal attribute into one
value.
|
class |
MergeTwoValues
Merges two values of a nominal attribute into one
value.
|
class |
NominalToBinary
Converts all nominal attributes into binary numeric
attributes.
|
class |
NominalToString
Converts a nominal attribute (i.e.
|
class |
Normalize
Normalizes all numeric values in the given dataset
(apart from the class attribute, if set).
|
class |
NumericCleaner
A filter that 'cleanses' the numeric data from
values that are too small, too big or very close to a certain value (e.g., 0)
and sets these values to a pre-defined default.
|
class |
NumericToBinary
Converts all numeric attributes into binary
attributes (apart from the class attribute, if set): if the value of the
numeric attribute is exactly zero, the value of the new attribute will be
zero.
|
class |
NumericToNominal
A filter for turning numeric attributes into
nominal ones.
|
class |
NumericTransform
Transforms numeric attributes using a given
transformation method.
|
class |
Obfuscate
A simple instance filter that renames the relation,
all attribute names and all nominal (and string) attribute values.
|
class |
PartitionedMultiFilter
A filter that applies filters on subsets of
attributes and assembles the output into a new dataset.
|
class |
PKIDiscretize
Discretizes numeric attributes using equal
frequency binning, where the number of bins is equal to the square root of
the number of non-missing values.
For more information, see: Ying Yang, Geoffrey I. |
class |
PotentialClassIgnorer
This filter should be extended by other unsupervised attribute filters to
allow processing of the class attribute if that's required.
|
class |
PrincipalComponents
Performs a principal components analysis and
transformation of the data.
Dimensionality reduction is accomplished by choosing enough eigenvectors to account for some percentage of the variance in the original data -- default 0.95 (95%). Based on code of the attribute selection scheme 'PrincipalComponents' by Mark Hall and Gabi Schmidberger. |
class |
RandomProjection
Reduces the dimensionality of the data by
projecting it onto a lower dimensional subspace using a random matrix with
columns of unit length (i.e.
|
class |
RandomSubset
Chooses a random subset of attributes, either an
absolute number or a percentage.
|
class |
Remove
An filter that removes a range of attributes from
the dataset.
|
class |
RemoveByName
Removes attributes based on a regular expression
matched against their names.
|
class |
RemoveType
Removes attributes of a given type.
|
class |
RemoveUseless
This filter removes attributes that do not vary at
all or that vary too much.
|
class |
RenameAttribute
This filter is used for renaming attribute names.
Regular expressions can be used in the matching and replacing. See Javadoc of java.util.regex.Pattern class for more information: http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html Valid options are: |
class |
RenameNominalValues
Renames the values of nominal attributes.
|
class |
Reorder
A filter that generates output with a new order of
the attributes.
|
class |
ReplaceMissingValues
Replaces all missing values for nominal and numeric attributes in a dataset with the modes and means from the training data.
|
class |
ReplaceMissingWithUserConstant
Replaces all missing values for nominal, string,
numeric and date attributes in the dataset with user-supplied constant
values.
|
class |
SortLabels
A simple filter for sorting the labels of nominal
attributes.
|
class |
Standardize
Standardizes all numeric attributes in the given dataset to have zero mean and unit variance (apart from the class attribute, if set).
|
class |
StringToNominal
Converts a range of string attributes (unspecified
number of values) to nominal (set number of values).
|
class |
StringToWordVector
Converts String attributes into a set of attributes
representing word occurrence (depending on the tokenizer) information from
the text contained in the strings.
|
class |
SwapValues
Swaps two values of a nominal attribute.
|
class |
TimeSeriesDelta
An instance filter that assumes instances form time-series data and replaces attribute values in the current instance with the difference between the current value and the equivalent attribute attribute value of some previous (or future) instance.
|
class |
TimeSeriesTranslate
An instance filter that assumes instances form time-series data and replaces attribute values in the current instance with the equivalent attribute values of some previous (or future) instance.
|
| Modifier and Type | Class and Description |
|---|---|
class |
NonSparseToSparse
An instance filter that converts all incoming
instances into sparse format.
|
class |
Randomize
Randomly shuffles the order of instances passed
through it.
|
class |
RemoveDuplicates
Removes all duplicate instances from the first batch of data it receives.
|
class |
RemoveFolds
This filter takes a dataset and outputs a specified
fold for cross validation.
|
class |
RemoveFrequentValues
Determines which values (frequent or infrequent
ones) of an (nominal) attribute are retained and filters the instances
accordingly.
|
class |
RemoveMisclassified
A filter that removes instances which are
incorrectly classified.
|
class |
RemovePercentage
A filter that removes a given percentage of a
dataset.
|
class |
RemoveRange
A filter that removes a given range of instances of
a dataset.
|
class |
RemoveWithValues
Filters instances according to the value of an
attribute.
|
class |
Resample
Produces a random subsample of a dataset using
either sampling with replacement or without replacement.
|
class |
ReservoirSample
Produces a random subsample of a dataset using the
reservoir sampling Algorithm "R" by Vitter.
|
class |
SparseToNonSparse
An instance filter that converts all incoming sparse instances into non-sparse format.
|
class |
SubsetByExpression
Filters instances according to a user-specified
expression.
Grammar: boolexpr_list ::= boolexpr_list boolexpr_part | boolexpr_part; boolexpr_part ::= boolexpr:e {: parser.setResult(e); :} ; boolexpr ::= BOOLEAN | true | false | expr < expr | expr <= expr | expr > expr | expr >= expr | expr = expr | ( boolexpr ) | not boolexpr | boolexpr and boolexpr | boolexpr or boolexpr | ATTRIBUTE is STRING ; expr ::= NUMBER | ATTRIBUTE | ( expr ) | opexpr | funcexpr ; opexpr ::= expr + expr | expr - expr | expr * expr | expr / expr ; funcexpr ::= abs ( expr ) | sqrt ( expr ) | log ( expr ) | exp ( expr ) | sin ( expr ) | cos ( expr ) | tan ( expr ) | rint ( expr ) | floor ( expr ) | pow ( expr for base , expr for exponent ) | ceil ( expr ) ; Notes: - NUMBER any integer or floating point number (but not in scientific notation!) - STRING any string surrounded by single quotes; the string may not contain a single quote though. - ATTRIBUTE the following placeholders are recognized for attribute values: - CLASS for the class value in case a class attribute is set. - ATTxyz with xyz a number from 1 to # of attributes in the dataset, representing the value of indexed attribute. Examples: - extracting only mammals and birds from the 'zoo' UCI dataset: (CLASS is 'mammal') or (CLASS is 'bird') - extracting only animals with at least 2 legs from the 'zoo' UCI dataset: (ATT14 >= 2) - extracting only instances with non-missing 'wage-increase-second-year' from the 'labor' UCI dataset: not ismissing(ATT3) Valid options are: |
Copyright © 2014 University of Waikato, Hamilton, NZ. All Rights Reserved.