Class PrincipalComponentsJ
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.unsupervised.attribute.PrincipalComponentsJ
-
- All Implemented Interfaces:
Serializable
,weka.core.CapabilitiesHandler
,weka.core.CapabilitiesIgnorer
,weka.core.CommandlineRunnable
,weka.core.OptionHandler
,weka.core.RevisionHandler
,weka.filters.UnsupervisedFilter
- Direct Known Subclasses:
PublicPrincipalComponents
public class PrincipalComponentsJ extends weka.filters.Filter implements weka.core.OptionHandler, weka.filters.UnsupervisedFilter
* Performs a principal components analysis and transformation of the data.
* Dimensionality reduction is accomplished by choosing enough eigenvectors to account for some percentage of the variance in the original data -- default 0.95 (95%).
* Based on code of the attribute selection scheme 'PrincipalComponents' by Mark Hall and Gabi Schmidberger. *
* Valid options are:* *
-C * Center (rather than standardize) the * data and compute PCA using the covariance (rather * than the correlation) matrix.
* *-R <num> * Retain enough PC attributes to account * for this proportion of variance in the original data. * (default: 0.95)
* *-A <num> * Maximum number of attributes to include in * transformed attribute names. * (-1 = include all, default: 5)
* *-M <num> * Maximum number of PC attributes to retain. * (-1 = include all, default: -1)
* *-simple-attribute-names * Whether to simply number the attributes instead of compiling * them from other attribute names. * (default: off)
*- Version:
- $Revision: 12037 $
- Author:
- Mark Hall (mhall@cs.waikato.ac.nz) -- attribute selection code, Gabi Schmidberger (gabi@cs.waikato.ac.nz) -- attribute selection code, fracpete (fracpete at waikato dot ac dot nz) -- filter code
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected weka.filters.unsupervised.attribute.Remove
m_AttributeFilter
Filter for removing class attribute, nominal attributes with 0 or 1 value.protected weka.filters.unsupervised.attribute.Center
m_centerFilter
Filter for centering the dataprotected int
m_ClassIndex
Class index.protected double[][]
m_Correlation
Correlation matrix for the original data.protected double
m_CoverVariance
the amount of varaince to cover in the original data when retaining the best n PC's.protected double[]
m_Eigenvalues
Eigenvalues for the corresponding eigenvectors.protected double[][]
m_Eigenvectors
Will hold the unordered linear transformations of the (normalized) original data.protected boolean
m_HasClass
Data has a class set.protected int
m_MaxAttributes
maximum number of attributes in the transformed data (-1 for all).protected int
m_MaxAttrsInName
maximum number of attributes in the transformed attribute name.protected weka.filters.unsupervised.attribute.NominalToBinary
m_NominalToBinaryFilter
Filter for turning nominal values into numeric ones.protected int
m_NumAttribs
Number of attributes.protected int
m_NumInstances
Number of instances.protected int
m_OutputNumAtts
The number of attributes in the pc transformed data.protected weka.filters.unsupervised.attribute.ReplaceMissingValues
m_ReplaceMissingFilter
Filters for replacing missing values.protected boolean
m_SimpleAttributeNames
whether to just number the attributes rather than compiling them from other attribute names.protected int[]
m_SortedEigens
Sorted eigenvalues.protected weka.filters.unsupervised.attribute.Standardize
m_standardizeFilter
Filter for standardizing the dataprotected double
m_SumOfEigenValues
sum of the eigenvalues.protected weka.core.Instances
m_TrainCopy
Keep a copy for the class attribute (if set).protected weka.core.Instances
m_TrainInstances
The data to transform analyse/transform.protected weka.core.Instances
m_TransformedFormat
The header for the transformed data format.
-
Constructor Summary
Constructors Constructor Description PrincipalComponentsJ()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
batchFinished()
Signify that this batch of input to the filter is finished.String
centerDataTipText()
Returns the tip text for this propertyprotected weka.core.Instance
convertInstance(weka.core.Instance instance)
Transform an instance in original (unormalized) format.protected weka.core.Instances
determineOutputFormat(weka.core.Instances inputFormat)
Determines the output format based on the input format and returns this.protected void
fillCovariance()
weka.core.Capabilities
getCapabilities()
Returns the capabilities of this evaluator.boolean
getCenterData()
Get whether to center (rather than standardize) the data.int
getMaximumAttributeNames()
Gets maximum number of attributes to include in transformed attribute names.int
getMaximumAttributes()
Gets maximum number of PC attributes to retain.String[]
getOptions()
Gets the current settings of the filter.String
getRevision()
Returns the revision string.boolean
getSimpleAttributeNames()
Get whether to just number the attributes rather than compiling names.double
getVarianceCovered()
Gets the proportion of total variance to account for when retaining principal components.String
globalInfo()
Returns a string describing this filter.boolean
input(weka.core.Instance instance)
Input an instance for filtering.Enumeration<weka.core.Option>
listOptions()
Returns an enumeration describing the available options.static void
main(String[] args)
Main method for running this filter.String
maximumAttributeNamesTipText()
Returns the tip text for this property.String
maximumAttributesTipText()
Returns the tip text for this property.void
setCenterData(boolean center)
Set whether to center (rather than standardize) the data.boolean
setInputFormat(weka.core.Instances instanceInfo)
Sets the format of the input instances.void
setMaximumAttributeNames(int value)
Sets maximum number of attributes to include in transformed attribute names.void
setMaximumAttributes(int value)
Sets maximum number of PC attributes to retain.void
setOptions(String[] options)
Parses a list of options for this object.void
setSimpleAttributeNames(boolean value)
Set whether to just number the attributes rather than compiling names.protected void
setup(weka.core.Instances instances)
Initializes the filter with the given input data.void
setVarianceCovered(double value)
Sets the amount of variance to account for when retaining principal components.String
simpleAttributeNamesTipText()
Returns the tip text for this propertyString
varianceCoveredTipText()
Returns the tip text for this property.-
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
-
-
-
-
Field Detail
-
m_TrainInstances
protected weka.core.Instances m_TrainInstances
The data to transform analyse/transform.
-
m_TrainCopy
protected weka.core.Instances m_TrainCopy
Keep a copy for the class attribute (if set).
-
m_TransformedFormat
protected weka.core.Instances m_TransformedFormat
The header for the transformed data format.
-
m_HasClass
protected boolean m_HasClass
Data has a class set.
-
m_ClassIndex
protected int m_ClassIndex
Class index.
-
m_NumAttribs
protected int m_NumAttribs
Number of attributes.
-
m_NumInstances
protected int m_NumInstances
Number of instances.
-
m_Correlation
protected double[][] m_Correlation
Correlation matrix for the original data.
-
m_Eigenvectors
protected double[][] m_Eigenvectors
Will hold the unordered linear transformations of the (normalized) original data.
-
m_Eigenvalues
protected double[] m_Eigenvalues
Eigenvalues for the corresponding eigenvectors.
-
m_SortedEigens
protected int[] m_SortedEigens
Sorted eigenvalues.
-
m_SumOfEigenValues
protected double m_SumOfEigenValues
sum of the eigenvalues.
-
m_ReplaceMissingFilter
protected weka.filters.unsupervised.attribute.ReplaceMissingValues m_ReplaceMissingFilter
Filters for replacing missing values.
-
m_NominalToBinaryFilter
protected weka.filters.unsupervised.attribute.NominalToBinary m_NominalToBinaryFilter
Filter for turning nominal values into numeric ones.
-
m_AttributeFilter
protected weka.filters.unsupervised.attribute.Remove m_AttributeFilter
Filter for removing class attribute, nominal attributes with 0 or 1 value.
-
m_standardizeFilter
protected weka.filters.unsupervised.attribute.Standardize m_standardizeFilter
Filter for standardizing the data
-
m_centerFilter
protected weka.filters.unsupervised.attribute.Center m_centerFilter
Filter for centering the data
-
m_OutputNumAtts
protected int m_OutputNumAtts
The number of attributes in the pc transformed data.
-
m_CoverVariance
protected double m_CoverVariance
the amount of varaince to cover in the original data when retaining the best n PC's.
-
m_MaxAttrsInName
protected int m_MaxAttrsInName
maximum number of attributes in the transformed attribute name.
-
m_MaxAttributes
protected int m_MaxAttributes
maximum number of attributes in the transformed data (-1 for all).
-
m_SimpleAttributeNames
protected boolean m_SimpleAttributeNames
whether to just number the attributes rather than compiling them from other attribute names.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing this filter.- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration<weka.core.Option> listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.filters.Filter
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Parses a list of options for this object.- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.filters.Filter
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current settings of the filter.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.filters.Filter
- Returns:
- an array of strings suitable for passing to setOptions
-
centerDataTipText
public String centerDataTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCenterData
public void setCenterData(boolean center)
Set whether to center (rather than standardize) the data. If set to true then PCA is computed from the covariance rather than correlation matrix.- Parameters:
center
- true if the data is to be centered rather than standardized
-
getCenterData
public boolean getCenterData()
Get whether to center (rather than standardize) the data. If true then PCA is computed from the covariance rather than correlation matrix.- Returns:
- true if the data is to be centered rather than standardized.
-
varianceCoveredTipText
public String varianceCoveredTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setVarianceCovered
public void setVarianceCovered(double value)
Sets the amount of variance to account for when retaining principal components.- Parameters:
value
- the proportion of total variance to account for
-
getVarianceCovered
public double getVarianceCovered()
Gets the proportion of total variance to account for when retaining principal components.- Returns:
- the proportion of variance to account for
-
maximumAttributeNamesTipText
public String maximumAttributeNamesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMaximumAttributeNames
public void setMaximumAttributeNames(int value)
Sets maximum number of attributes to include in transformed attribute names.- Parameters:
value
- the maximum number of attributes
-
getMaximumAttributeNames
public int getMaximumAttributeNames()
Gets maximum number of attributes to include in transformed attribute names.- Returns:
- the maximum number of attributes
-
maximumAttributesTipText
public String maximumAttributesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMaximumAttributes
public void setMaximumAttributes(int value)
Sets maximum number of PC attributes to retain.- Parameters:
value
- the maximum number of attributes
-
getMaximumAttributes
public int getMaximumAttributes()
Gets maximum number of PC attributes to retain.- Returns:
- the maximum number of attributes
-
simpleAttributeNamesTipText
public String simpleAttributeNamesTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSimpleAttributeNames
public void setSimpleAttributeNames(boolean value)
Set whether to just number the attributes rather than compiling names.- Parameters:
value
- true if to just number the attributes
-
getSimpleAttributeNames
public boolean getSimpleAttributeNames()
Get whether to just number the attributes rather than compiling names.- Returns:
- true if to just number the attributes
-
getCapabilities
public weka.core.Capabilities getCapabilities()
Returns the capabilities of this evaluator.- Specified by:
getCapabilities
in interfaceweka.core.CapabilitiesHandler
- Overrides:
getCapabilities
in classweka.filters.Filter
- Returns:
- the capabilities of this evaluator
- See Also:
Capabilities
-
determineOutputFormat
protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat) throws Exception
Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().- Parameters:
inputFormat
- the input format to base the output format on- Returns:
- the output format
- Throws:
Exception
- in case the determination goes wrong- See Also:
batchFinished()
-
convertInstance
protected weka.core.Instance convertInstance(weka.core.Instance instance) throws Exception
Transform an instance in original (unormalized) format.- Parameters:
instance
- an instance in the original (unormalized) format- Returns:
- a transformed instance
- Throws:
Exception
- if instance can't be transformed
-
setup
protected void setup(weka.core.Instances instances) throws Exception
Initializes the filter with the given input data.- Parameters:
instances
- the data to process- Throws:
Exception
- in case the processing goes wrong- See Also:
batchFinished()
-
setInputFormat
public boolean setInputFormat(weka.core.Instances instanceInfo) throws Exception
Sets the format of the input instances.- Overrides:
setInputFormat
in classweka.filters.Filter
- Parameters:
instanceInfo
- an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).- Returns:
- true if the outputFormat may be collected immediately
- Throws:
Exception
- if the input format can't be set successfully
-
input
public boolean input(weka.core.Instance instance) throws Exception
Input an instance for filtering. Filter requires all training instances be read before producing output.- Overrides:
input
in classweka.filters.Filter
- Parameters:
instance
- the input instance- Returns:
- true if the filtered instance may now be collected with output().
- Throws:
IllegalStateException
- if no input format has been setException
- if conversion fails
-
batchFinished
public boolean batchFinished() throws Exception
Signify that this batch of input to the filter is finished.- Overrides:
batchFinished
in classweka.filters.Filter
- Returns:
- true if there are instances pending output
- Throws:
NullPointerException
- if no input structure has been defined,Exception
- if there was a problem finishing the batch.
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceweka.core.RevisionHandler
- Overrides:
getRevision
in classweka.filters.Filter
- Returns:
- the revision
-
main
public static void main(String[] args)
Main method for running this filter.- Parameters:
args
- should contain arguments to the filter: use -h for help
-
-