Class PCA

  • All Implemented Interfaces:
    Destroyable, GlobalInfoSupporter, LoggingLevelHandler, LoggingSupporter, OptionHandler, SizeOfHandler, CapabilitiesHandler, BatchFilter, ColumnSubsetFilter, Filter, Serializable

    public class PCA
    extends AbstractColumnSubsetBatchFilter
    Performs principal components analysis.

    -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel)
        The logging level for outputting errors and debugging output.
        default: WARNING
     
    -column-subset <RANGE|REGEXP> (property: columnSubset)
        Defines how to determine the columns to use for filtering.
        default: RANGE
     
    -col-range <adams.data.spreadsheet.SpreadSheetColumnRange> (property: colRange)
        The range of columns to use in the filtering process.
        default: first-last
        example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; column names (case-sensitive) as well as the following placeholders can be used: first, second, third, last_2, last_1, last; numeric indices can be enforced by preceding them with '#' (eg '#12'); column names can be surrounded by double quotes.
     
    -col-regexp <adams.core.base.BaseRegExp> (property: colRegExp)
        The regular expression to use on the column names to determine whether to
        use a column for filtering.
        default: .*
        more: https://docs.oracle.com/javase/tutorial/essential/regex/
        https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
     
    -drop-other-columns <boolean> (property: dropOtherColumns)
        If enabled, other columns that aren't used for filtering get removed from
        the output; does not affect any class columns.
        default: false
     
    -variance <double> (property: variance)
        The variance to cover.
        default: 0.95
        minimum: 0.0
        maximum: 1.0
     
    -max-columns <int> (property: maxColumns)
        The maximum number of columns to generate.
        default: -1
        minimum: -1
     
    -center <boolean> (property: center)
        If enabled, the data gets centered rather than standardized, computing PCA
        from covariance matrix rather than correlation matrix.
        default: false
     
    Author:
    FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • m_Variance

        protected double m_Variance
        the variance to cover.
      • m_MaxColumns

        protected int m_MaxColumns
        the maximum number of attributes.
      • m_Center

        protected boolean m_Center
        whether to center (rather than standardize) the data and compute PCA from covariance (rather than correlation) matrix.
      • m_Algorithm

        protected com.github.waikatodatamining.matrix.algorithm.PCA m_Algorithm
        the actual algorithm.
      • m_NumColumns

        protected int m_NumColumns
        the number of columns that got determined.
      • m_Transformed

        protected transient com.github.waikatodatamining.matrix.core.Matrix m_Transformed
        temp matrix to avoid duplicate transformation.
    • Constructor Detail

      • PCA

        public PCA()
    • Method Detail

      • setVariance

        public void setVariance​(double value)
        Sets the variance.
        Parameters:
        value - the variance
      • getVariance

        public double getVariance()
        Returns the variance.
        Returns:
        the variance
      • varianceTipText

        public String varianceTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setMaxColumns

        public void setMaxColumns​(int value)
        Sets the maximum attributes.
        Parameters:
        value - the maximum
      • getMaxColumns

        public int getMaxColumns()
        Returns the maximum attributes.
        Returns:
        the maximum
      • maxColumnsTipText

        public String maxColumnsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setCenter

        public void setCenter​(boolean center)
        Set whether to center (rather than standardize) the data. If set to true then PCA is computed from the covariance rather than correlation matrix.
        Parameters:
        center - true if the data is to be centered rather than standardized
      • getCenter

        public boolean getCenter()
        Get whether to center (rather than standardize) the data. If true then PCA is computed from the covariance rather than correlation matrix.
        Returns:
        true if the data is to be centered rather than standardized.
      • centerTipText

        public String centerTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getCapabilities

        public Capabilities getCapabilities()
        Returns the capabilities.
        Returns:
        the capabilities