Class PrincipalComponentsJ
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.unsupervised.attribute.PrincipalComponentsJ
-
- All Implemented Interfaces:
Serializable,weka.core.CapabilitiesHandler,weka.core.CapabilitiesIgnorer,weka.core.CommandlineRunnable,weka.core.OptionHandler,weka.core.RevisionHandler,weka.filters.UnsupervisedFilter
- Direct Known Subclasses:
PublicPrincipalComponents
public class PrincipalComponentsJ extends weka.filters.Filter implements weka.core.OptionHandler, weka.filters.UnsupervisedFilter* Performs a principal components analysis and transformation of the data.
* Dimensionality reduction is accomplished by choosing enough eigenvectors to account for some percentage of the variance in the original data -- default 0.95 (95%).
* Based on code of the attribute selection scheme 'PrincipalComponents' by Mark Hall and Gabi Schmidberger. *
* Valid options are:* *
-C * Center (rather than standardize) the * data and compute PCA using the covariance (rather * than the correlation) matrix.
* *-R <num> * Retain enough PC attributes to account * for this proportion of variance in the original data. * (default: 0.95)
* *-A <num> * Maximum number of attributes to include in * transformed attribute names. * (-1 = include all, default: 5)
* *-M <num> * Maximum number of PC attributes to retain. * (-1 = include all, default: -1)
* *-simple-attribute-names * Whether to simply number the attributes instead of compiling * them from other attribute names. * (default: off)
*- Version:
- $Revision: 12037 $
- Author:
- Mark Hall ([email protected]) -- attribute selection code, Gabi Schmidberger ([email protected]) -- attribute selection code, fracpete (fracpete at waikato dot ac dot nz) -- filter code
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected weka.filters.unsupervised.attribute.Removem_AttributeFilterFilter for removing class attribute, nominal attributes with 0 or 1 value.protected weka.filters.unsupervised.attribute.Centerm_centerFilterFilter for centering the dataprotected intm_ClassIndexClass index.protected double[][]m_CorrelationCorrelation matrix for the original data.protected doublem_CoverVariancethe amount of varaince to cover in the original data when retaining the best n PC's.protected double[]m_EigenvaluesEigenvalues for the corresponding eigenvectors.protected double[][]m_EigenvectorsWill hold the unordered linear transformations of the (normalized) original data.protected booleanm_HasClassData has a class set.protected intm_MaxAttributesmaximum number of attributes in the transformed data (-1 for all).protected intm_MaxAttrsInNamemaximum number of attributes in the transformed attribute name.protected weka.filters.unsupervised.attribute.NominalToBinarym_NominalToBinaryFilterFilter for turning nominal values into numeric ones.protected intm_NumAttribsNumber of attributes.protected intm_NumInstancesNumber of instances.protected intm_OutputNumAttsThe number of attributes in the pc transformed data.protected weka.filters.unsupervised.attribute.ReplaceMissingValuesm_ReplaceMissingFilterFilters for replacing missing values.protected booleanm_SimpleAttributeNameswhether to just number the attributes rather than compiling them from other attribute names.protected int[]m_SortedEigensSorted eigenvalues.protected weka.filters.unsupervised.attribute.Standardizem_standardizeFilterFilter for standardizing the dataprotected doublem_SumOfEigenValuessum of the eigenvalues.protected weka.core.Instancesm_TrainCopyKeep a copy for the class attribute (if set).protected weka.core.Instancesm_TrainInstancesThe data to transform analyse/transform.protected weka.core.Instancesm_TransformedFormatThe header for the transformed data format.
-
Constructor Summary
Constructors Constructor Description PrincipalComponentsJ()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanbatchFinished()Signify that this batch of input to the filter is finished.StringcenterDataTipText()Returns the tip text for this propertyprotected weka.core.InstanceconvertInstance(weka.core.Instance instance)Transform an instance in original (unormalized) format.protected weka.core.InstancesdetermineOutputFormat(weka.core.Instances inputFormat)Determines the output format based on the input format and returns this.protected voidfillCovariance()weka.core.CapabilitiesgetCapabilities()Returns the capabilities of this evaluator.booleangetCenterData()Get whether to center (rather than standardize) the data.intgetMaximumAttributeNames()Gets maximum number of attributes to include in transformed attribute names.intgetMaximumAttributes()Gets maximum number of PC attributes to retain.String[]getOptions()Gets the current settings of the filter.StringgetRevision()Returns the revision string.booleangetSimpleAttributeNames()Get whether to just number the attributes rather than compiling names.doublegetVarianceCovered()Gets the proportion of total variance to account for when retaining principal components.StringglobalInfo()Returns a string describing this filter.booleaninput(weka.core.Instance instance)Input an instance for filtering.Enumeration<weka.core.Option>listOptions()Returns an enumeration describing the available options.static voidmain(String[] args)Main method for running this filter.StringmaximumAttributeNamesTipText()Returns the tip text for this property.StringmaximumAttributesTipText()Returns the tip text for this property.voidsetCenterData(boolean center)Set whether to center (rather than standardize) the data.booleansetInputFormat(weka.core.Instances instanceInfo)Sets the format of the input instances.voidsetMaximumAttributeNames(int value)Sets maximum number of attributes to include in transformed attribute names.voidsetMaximumAttributes(int value)Sets maximum number of PC attributes to retain.voidsetOptions(String[] options)Parses a list of options for this object.voidsetSimpleAttributeNames(boolean value)Set whether to just number the attributes rather than compiling names.protected voidsetup(weka.core.Instances instances)Initializes the filter with the given input data.voidsetVarianceCovered(double value)Sets the amount of variance to account for when retaining principal components.StringsimpleAttributeNamesTipText()Returns the tip text for this propertyStringvarianceCoveredTipText()Returns the tip text for this property.-
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
-
-
-
-
Field Detail
-
m_TrainInstances
protected weka.core.Instances m_TrainInstances
The data to transform analyse/transform.
-
m_TrainCopy
protected weka.core.Instances m_TrainCopy
Keep a copy for the class attribute (if set).
-
m_TransformedFormat
protected weka.core.Instances m_TransformedFormat
The header for the transformed data format.
-
m_HasClass
protected boolean m_HasClass
Data has a class set.
-
m_ClassIndex
protected int m_ClassIndex
Class index.
-
m_NumAttribs
protected int m_NumAttribs
Number of attributes.
-
m_NumInstances
protected int m_NumInstances
Number of instances.
-
m_Correlation
protected double[][] m_Correlation
Correlation matrix for the original data.
-
m_Eigenvectors
protected double[][] m_Eigenvectors
Will hold the unordered linear transformations of the (normalized) original data.
-
m_Eigenvalues
protected double[] m_Eigenvalues
Eigenvalues for the corresponding eigenvectors.
-
m_SortedEigens
protected int[] m_SortedEigens
Sorted eigenvalues.
-
m_SumOfEigenValues
protected double m_SumOfEigenValues
sum of the eigenvalues.
-
m_ReplaceMissingFilter
protected weka.filters.unsupervised.attribute.ReplaceMissingValues m_ReplaceMissingFilter
Filters for replacing missing values.
-
m_NominalToBinaryFilter
protected weka.filters.unsupervised.attribute.NominalToBinary m_NominalToBinaryFilter
Filter for turning nominal values into numeric ones.
-
m_AttributeFilter
protected weka.filters.unsupervised.attribute.Remove m_AttributeFilter
Filter for removing class attribute, nominal attributes with 0 or 1 value.
-
m_standardizeFilter
protected weka.filters.unsupervised.attribute.Standardize m_standardizeFilter
Filter for standardizing the data
-
m_centerFilter
protected weka.filters.unsupervised.attribute.Center m_centerFilter
Filter for centering the data
-
m_OutputNumAtts
protected int m_OutputNumAtts
The number of attributes in the pc transformed data.
-
m_CoverVariance
protected double m_CoverVariance
the amount of varaince to cover in the original data when retaining the best n PC's.
-
m_MaxAttrsInName
protected int m_MaxAttrsInName
maximum number of attributes in the transformed attribute name.
-
m_MaxAttributes
protected int m_MaxAttributes
maximum number of attributes in the transformed data (-1 for all).
-
m_SimpleAttributeNames
protected boolean m_SimpleAttributeNames
whether to just number the attributes rather than compiling them from other attribute names.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing this filter.- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration<weka.core.Option> listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptionsin interfaceweka.core.OptionHandler- Overrides:
listOptionsin classweka.filters.Filter- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Parses a list of options for this object.- Specified by:
setOptionsin interfaceweka.core.OptionHandler- Overrides:
setOptionsin classweka.filters.Filter- Parameters:
options- the list of options as an array of strings- Throws:
Exception- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current settings of the filter.- Specified by:
getOptionsin interfaceweka.core.OptionHandler- Overrides:
getOptionsin classweka.filters.Filter- Returns:
- an array of strings suitable for passing to setOptions
-
centerDataTipText
public String centerDataTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCenterData
public void setCenterData(boolean center)
Set whether to center (rather than standardize) the data. If set to true then PCA is computed from the covariance rather than correlation matrix.- Parameters:
center- true if the data is to be centered rather than standardized
-
getCenterData
public boolean getCenterData()
Get whether to center (rather than standardize) the data. If true then PCA is computed from the covariance rather than correlation matrix.- Returns:
- true if the data is to be centered rather than standardized.
-
varianceCoveredTipText
public String varianceCoveredTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setVarianceCovered
public void setVarianceCovered(double value)
Sets the amount of variance to account for when retaining principal components.- Parameters:
value- the proportion of total variance to account for
-
getVarianceCovered
public double getVarianceCovered()
Gets the proportion of total variance to account for when retaining principal components.- Returns:
- the proportion of variance to account for
-
maximumAttributeNamesTipText
public String maximumAttributeNamesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMaximumAttributeNames
public void setMaximumAttributeNames(int value)
Sets maximum number of attributes to include in transformed attribute names.- Parameters:
value- the maximum number of attributes
-
getMaximumAttributeNames
public int getMaximumAttributeNames()
Gets maximum number of attributes to include in transformed attribute names.- Returns:
- the maximum number of attributes
-
maximumAttributesTipText
public String maximumAttributesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMaximumAttributes
public void setMaximumAttributes(int value)
Sets maximum number of PC attributes to retain.- Parameters:
value- the maximum number of attributes
-
getMaximumAttributes
public int getMaximumAttributes()
Gets maximum number of PC attributes to retain.- Returns:
- the maximum number of attributes
-
simpleAttributeNamesTipText
public String simpleAttributeNamesTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSimpleAttributeNames
public void setSimpleAttributeNames(boolean value)
Set whether to just number the attributes rather than compiling names.- Parameters:
value- true if to just number the attributes
-
getSimpleAttributeNames
public boolean getSimpleAttributeNames()
Get whether to just number the attributes rather than compiling names.- Returns:
- true if to just number the attributes
-
getCapabilities
public weka.core.Capabilities getCapabilities()
Returns the capabilities of this evaluator.- Specified by:
getCapabilitiesin interfaceweka.core.CapabilitiesHandler- Overrides:
getCapabilitiesin classweka.filters.Filter- Returns:
- the capabilities of this evaluator
- See Also:
Capabilities
-
determineOutputFormat
protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat) throws ExceptionDetermines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().- Parameters:
inputFormat- the input format to base the output format on- Returns:
- the output format
- Throws:
Exception- in case the determination goes wrong- See Also:
batchFinished()
-
convertInstance
protected weka.core.Instance convertInstance(weka.core.Instance instance) throws ExceptionTransform an instance in original (unormalized) format.- Parameters:
instance- an instance in the original (unormalized) format- Returns:
- a transformed instance
- Throws:
Exception- if instance can't be transformed
-
setup
protected void setup(weka.core.Instances instances) throws ExceptionInitializes the filter with the given input data.- Parameters:
instances- the data to process- Throws:
Exception- in case the processing goes wrong- See Also:
batchFinished()
-
setInputFormat
public boolean setInputFormat(weka.core.Instances instanceInfo) throws ExceptionSets the format of the input instances.- Overrides:
setInputFormatin classweka.filters.Filter- Parameters:
instanceInfo- an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).- Returns:
- true if the outputFormat may be collected immediately
- Throws:
Exception- if the input format can't be set successfully
-
input
public boolean input(weka.core.Instance instance) throws ExceptionInput an instance for filtering. Filter requires all training instances be read before producing output.- Overrides:
inputin classweka.filters.Filter- Parameters:
instance- the input instance- Returns:
- true if the filtered instance may now be collected with output().
- Throws:
IllegalStateException- if no input format has been setException- if conversion fails
-
batchFinished
public boolean batchFinished() throws ExceptionSignify that this batch of input to the filter is finished.- Overrides:
batchFinishedin classweka.filters.Filter- Returns:
- true if there are instances pending output
- Throws:
NullPointerException- if no input structure has been defined,Exception- if there was a problem finishing the batch.
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevisionin interfaceweka.core.RevisionHandler- Overrides:
getRevisionin classweka.filters.Filter- Returns:
- the revision
-
main
public static void main(String[] args)
Main method for running this filter.- Parameters:
args- should contain arguments to the filter: use -h for help
-
-