Class PrincipalComponentsJ

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.RevisionHandler, weka.filters.UnsupervisedFilter
    Direct Known Subclasses:
    PublicPrincipalComponents

    public class PrincipalComponentsJ
    extends weka.filters.Filter
    implements weka.core.OptionHandler, weka.filters.UnsupervisedFilter
    * Performs a principal components analysis and transformation of the data.
    * Dimensionality reduction is accomplished by choosing enough eigenvectors to account for some percentage of the variance in the original data -- default 0.95 (95%).
    * Based on code of the attribute selection scheme 'PrincipalComponents' by Mark Hall and Gabi Schmidberger. *

    * Valid options are:

    * *

     -C
     *  Center (rather than standardize) the
     *  data and compute PCA using the covariance (rather
     *   than the correlation) matrix.
    * *
     -R <num>
     *  Retain enough PC attributes to account
     *  for this proportion of variance in the original data.
     *  (default: 0.95)
    * *
     -A <num>
     *  Maximum number of attributes to include in 
     *  transformed attribute names.
     *  (-1 = include all, default: 5)
    * *
     -M <num>
     *  Maximum number of PC attributes to retain.
     *  (-1 = include all, default: -1)
    * *
     -simple-attribute-names
     *  Whether to simply number the attributes instead of compiling
     *  them from other attribute names.
     *  (default: off)
    *
    Version:
    $Revision: 12037 $
    Author:
    Mark Hall (mhall@cs.waikato.ac.nz) -- attribute selection code, Gabi Schmidberger (gabi@cs.waikato.ac.nz) -- attribute selection code, fracpete (fracpete at waikato dot ac dot nz) -- filter code
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected weka.filters.unsupervised.attribute.Remove m_AttributeFilter
      Filter for removing class attribute, nominal attributes with 0 or 1 value.
      protected weka.filters.unsupervised.attribute.Center m_centerFilter
      Filter for centering the data
      protected int m_ClassIndex
      Class index.
      protected double[][] m_Correlation
      Correlation matrix for the original data.
      protected double m_CoverVariance
      the amount of varaince to cover in the original data when retaining the best n PC's.
      protected double[] m_Eigenvalues
      Eigenvalues for the corresponding eigenvectors.
      protected double[][] m_Eigenvectors
      Will hold the unordered linear transformations of the (normalized) original data.
      protected boolean m_HasClass
      Data has a class set.
      protected int m_MaxAttributes
      maximum number of attributes in the transformed data (-1 for all).
      protected int m_MaxAttrsInName
      maximum number of attributes in the transformed attribute name.
      protected weka.filters.unsupervised.attribute.NominalToBinary m_NominalToBinaryFilter
      Filter for turning nominal values into numeric ones.
      protected int m_NumAttribs
      Number of attributes.
      protected int m_NumInstances
      Number of instances.
      protected int m_OutputNumAtts
      The number of attributes in the pc transformed data.
      protected weka.filters.unsupervised.attribute.ReplaceMissingValues m_ReplaceMissingFilter
      Filters for replacing missing values.
      protected boolean m_SimpleAttributeNames
      whether to just number the attributes rather than compiling them from other attribute names.
      protected int[] m_SortedEigens
      Sorted eigenvalues.
      protected weka.filters.unsupervised.attribute.Standardize m_standardizeFilter
      Filter for standardizing the data
      protected double m_SumOfEigenValues
      sum of the eigenvalues.
      protected weka.core.Instances m_TrainCopy
      Keep a copy for the class attribute (if set).
      protected weka.core.Instances m_TrainInstances
      The data to transform analyse/transform.
      protected weka.core.Instances m_TransformedFormat
      The header for the transformed data format.
      • Fields inherited from class weka.filters.Filter

        m_Debug, m_DoNotCheckCapabilities, m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean batchFinished()
      Signify that this batch of input to the filter is finished.
      String centerDataTipText()
      Returns the tip text for this property
      protected weka.core.Instance convertInstance​(weka.core.Instance instance)
      Transform an instance in original (unormalized) format.
      protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
      Determines the output format based on the input format and returns this.
      protected void fillCovariance()  
      weka.core.Capabilities getCapabilities()
      Returns the capabilities of this evaluator.
      boolean getCenterData()
      Get whether to center (rather than standardize) the data.
      int getMaximumAttributeNames()
      Gets maximum number of attributes to include in transformed attribute names.
      int getMaximumAttributes()
      Gets maximum number of PC attributes to retain.
      String[] getOptions()
      Gets the current settings of the filter.
      String getRevision()
      Returns the revision string.
      boolean getSimpleAttributeNames()
      Get whether to just number the attributes rather than compiling names.
      double getVarianceCovered()
      Gets the proportion of total variance to account for when retaining principal components.
      String globalInfo()
      Returns a string describing this filter.
      boolean input​(weka.core.Instance instance)
      Input an instance for filtering.
      Enumeration<weka.core.Option> listOptions()
      Returns an enumeration describing the available options.
      static void main​(String[] args)
      Main method for running this filter.
      String maximumAttributeNamesTipText()
      Returns the tip text for this property.
      String maximumAttributesTipText()
      Returns the tip text for this property.
      void setCenterData​(boolean center)
      Set whether to center (rather than standardize) the data.
      boolean setInputFormat​(weka.core.Instances instanceInfo)
      Sets the format of the input instances.
      void setMaximumAttributeNames​(int value)
      Sets maximum number of attributes to include in transformed attribute names.
      void setMaximumAttributes​(int value)
      Sets maximum number of PC attributes to retain.
      void setOptions​(String[] options)
      Parses a list of options for this object.
      void setSimpleAttributeNames​(boolean value)
      Set whether to just number the attributes rather than compiling names.
      protected void setup​(weka.core.Instances instances)
      Initializes the filter with the given input data.
      void setVarianceCovered​(double value)
      Sets the amount of variance to account for when retaining principal components.
      String simpleAttributeNamesTipText()
      Returns the tip text for this property
      String varianceCoveredTipText()
      Returns the tip text for this property.
      • Methods inherited from class weka.filters.Filter

        batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
    • Field Detail

      • m_TrainInstances

        protected weka.core.Instances m_TrainInstances
        The data to transform analyse/transform.
      • m_TrainCopy

        protected weka.core.Instances m_TrainCopy
        Keep a copy for the class attribute (if set).
      • m_TransformedFormat

        protected weka.core.Instances m_TransformedFormat
        The header for the transformed data format.
      • m_HasClass

        protected boolean m_HasClass
        Data has a class set.
      • m_ClassIndex

        protected int m_ClassIndex
        Class index.
      • m_NumAttribs

        protected int m_NumAttribs
        Number of attributes.
      • m_NumInstances

        protected int m_NumInstances
        Number of instances.
      • m_Correlation

        protected double[][] m_Correlation
        Correlation matrix for the original data.
      • m_Eigenvectors

        protected double[][] m_Eigenvectors
        Will hold the unordered linear transformations of the (normalized) original data.
      • m_Eigenvalues

        protected double[] m_Eigenvalues
        Eigenvalues for the corresponding eigenvectors.
      • m_SortedEigens

        protected int[] m_SortedEigens
        Sorted eigenvalues.
      • m_SumOfEigenValues

        protected double m_SumOfEigenValues
        sum of the eigenvalues.
      • m_ReplaceMissingFilter

        protected weka.filters.unsupervised.attribute.ReplaceMissingValues m_ReplaceMissingFilter
        Filters for replacing missing values.
      • m_NominalToBinaryFilter

        protected weka.filters.unsupervised.attribute.NominalToBinary m_NominalToBinaryFilter
        Filter for turning nominal values into numeric ones.
      • m_AttributeFilter

        protected weka.filters.unsupervised.attribute.Remove m_AttributeFilter
        Filter for removing class attribute, nominal attributes with 0 or 1 value.
      • m_standardizeFilter

        protected weka.filters.unsupervised.attribute.Standardize m_standardizeFilter
        Filter for standardizing the data
      • m_centerFilter

        protected weka.filters.unsupervised.attribute.Center m_centerFilter
        Filter for centering the data
      • m_OutputNumAtts

        protected int m_OutputNumAtts
        The number of attributes in the pc transformed data.
      • m_CoverVariance

        protected double m_CoverVariance
        the amount of varaince to cover in the original data when retaining the best n PC's.
      • m_MaxAttrsInName

        protected int m_MaxAttrsInName
        maximum number of attributes in the transformed attribute name.
      • m_MaxAttributes

        protected int m_MaxAttributes
        maximum number of attributes in the transformed data (-1 for all).
      • m_SimpleAttributeNames

        protected boolean m_SimpleAttributeNames
        whether to just number the attributes rather than compiling them from other attribute names.
    • Constructor Detail

      • PrincipalComponentsJ

        public PrincipalComponentsJ()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this filter.
        Returns:
        a description of the filter suitable for displaying in the explorer/experimenter gui
      • listOptions

        public Enumeration<weka.core.Option> listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses a list of options for this object.
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
      • getOptions

        public String[] getOptions()
        Gets the current settings of the filter.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        an array of strings suitable for passing to setOptions
      • centerDataTipText

        public String centerDataTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setCenterData

        public void setCenterData​(boolean center)
        Set whether to center (rather than standardize) the data. If set to true then PCA is computed from the covariance rather than correlation matrix.
        Parameters:
        center - true if the data is to be centered rather than standardized
      • getCenterData

        public boolean getCenterData()
        Get whether to center (rather than standardize) the data. If true then PCA is computed from the covariance rather than correlation matrix.
        Returns:
        true if the data is to be centered rather than standardized.
      • varianceCoveredTipText

        public String varianceCoveredTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setVarianceCovered

        public void setVarianceCovered​(double value)
        Sets the amount of variance to account for when retaining principal components.
        Parameters:
        value - the proportion of total variance to account for
      • getVarianceCovered

        public double getVarianceCovered()
        Gets the proportion of total variance to account for when retaining principal components.
        Returns:
        the proportion of variance to account for
      • maximumAttributeNamesTipText

        public String maximumAttributeNamesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMaximumAttributeNames

        public void setMaximumAttributeNames​(int value)
        Sets maximum number of attributes to include in transformed attribute names.
        Parameters:
        value - the maximum number of attributes
      • getMaximumAttributeNames

        public int getMaximumAttributeNames()
        Gets maximum number of attributes to include in transformed attribute names.
        Returns:
        the maximum number of attributes
      • maximumAttributesTipText

        public String maximumAttributesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMaximumAttributes

        public void setMaximumAttributes​(int value)
        Sets maximum number of PC attributes to retain.
        Parameters:
        value - the maximum number of attributes
      • getMaximumAttributes

        public int getMaximumAttributes()
        Gets maximum number of PC attributes to retain.
        Returns:
        the maximum number of attributes
      • simpleAttributeNamesTipText

        public String simpleAttributeNamesTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setSimpleAttributeNames

        public void setSimpleAttributeNames​(boolean value)
        Set whether to just number the attributes rather than compiling names.
        Parameters:
        value - true if to just number the attributes
      • getSimpleAttributeNames

        public boolean getSimpleAttributeNames()
        Get whether to just number the attributes rather than compiling names.
        Returns:
        true if to just number the attributes
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns the capabilities of this evaluator.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
        Returns:
        the capabilities of this evaluator
        See Also:
        Capabilities
      • determineOutputFormat

        protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
                                                     throws Exception
        Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().
        Parameters:
        inputFormat - the input format to base the output format on
        Returns:
        the output format
        Throws:
        Exception - in case the determination goes wrong
        See Also:
        batchFinished()
      • convertInstance

        protected weka.core.Instance convertInstance​(weka.core.Instance instance)
                                              throws Exception
        Transform an instance in original (unormalized) format.
        Parameters:
        instance - an instance in the original (unormalized) format
        Returns:
        a transformed instance
        Throws:
        Exception - if instance can't be transformed
      • setup

        protected void setup​(weka.core.Instances instances)
                      throws Exception
        Initializes the filter with the given input data.
        Parameters:
        instances - the data to process
        Throws:
        Exception - in case the processing goes wrong
        See Also:
        batchFinished()
      • setInputFormat

        public boolean setInputFormat​(weka.core.Instances instanceInfo)
                               throws Exception
        Sets the format of the input instances.
        Overrides:
        setInputFormat in class weka.filters.Filter
        Parameters:
        instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
        Returns:
        true if the outputFormat may be collected immediately
        Throws:
        Exception - if the input format can't be set successfully
      • input

        public boolean input​(weka.core.Instance instance)
                      throws Exception
        Input an instance for filtering. Filter requires all training instances be read before producing output.
        Overrides:
        input in class weka.filters.Filter
        Parameters:
        instance - the input instance
        Returns:
        true if the filtered instance may now be collected with output().
        Throws:
        IllegalStateException - if no input format has been set
        Exception - if conversion fails
      • batchFinished

        public boolean batchFinished()
                              throws Exception
        Signify that this batch of input to the filter is finished.
        Overrides:
        batchFinished in class weka.filters.Filter
        Returns:
        true if there are instances pending output
        Throws:
        NullPointerException - if no input structure has been defined,
        Exception - if there was a problem finishing the batch.
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.filters.Filter
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        Main method for running this filter.
        Parameters:
        args - should contain arguments to the filter: use -h for help