Class PCA
-
- All Implemented Interfaces:
Destroyable
,GlobalInfoSupporter
,LoggingLevelHandler
,LoggingSupporter
,OptionHandler
,SizeOfHandler
,CapabilitiesHandler
,BatchFilter
,ColumnSubsetFilter
,Filter
,Serializable
public class PCA extends AbstractColumnSubsetBatchFilter
Performs principal components analysis.
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-column-subset <RANGE|REGEXP> (property: columnSubset) Defines how to determine the columns to use for filtering. default: RANGE
-col-range <adams.data.spreadsheet.SpreadSheetColumnRange> (property: colRange) The range of columns to use in the filtering process. default: first-last example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; column names (case-sensitive) as well as the following placeholders can be used: first, second, third, last_2, last_1, last; numeric indices can be enforced by preceding them with '#' (eg '#12'); column names can be surrounded by double quotes.
-col-regexp <adams.core.base.BaseRegExp> (property: colRegExp) The regular expression to use on the column names to determine whether to use a column for filtering. default: .* more: https://docs.oracle.com/javase/tutorial/essential/regex/ https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
-drop-other-columns <boolean> (property: dropOtherColumns) If enabled, other columns that aren't used for filtering get removed from the output; does not affect any class columns. default: false
-variance <double> (property: variance) The variance to cover. default: 0.95 minimum: 0.0 maximum: 1.0
-max-columns <int> (property: maxColumns) The maximum number of columns to generate. default: -1 minimum: -1
-center <boolean> (property: center) If enabled, the data gets centered rather than standardized, computing PCA from covariance matrix rather than correlation matrix. default: false
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected com.github.waikatodatamining.matrix.algorithm.PCA
m_Algorithm
the actual algorithm.protected boolean
m_Center
whether to center (rather than standardize) the data and compute PCA from covariance (rather than correlation) matrix.protected int
m_MaxColumns
the maximum number of attributes.protected int
m_NumColumns
the number of columns that got determined.protected com.github.waikatodatamining.matrix.core.Matrix
m_Transformed
temp matrix to avoid duplicate transformation.protected double
m_Variance
the variance to cover.-
Fields inherited from class adams.ml.preprocessing.AbstractColumnSubsetFilter
m_ClassColumns, m_ColRange, m_ColRegExp, m_ColumnSubset, m_DataColumns, m_DropOtherColumns, m_OtherColumns
-
Fields inherited from class adams.ml.preprocessing.AbstractFilter
m_Initialized, m_OutputFormat
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description PCA()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
centerTipText()
Returns the tip text for this property.void
defineOptions()
Adds options to the internal list of options.protected Dataset
doFilter(Dataset data)
Filters the dataset coming through.protected void
doInitFilter(Dataset data)
Filter-specific initialization.Capabilities
getCapabilities()
Returns the capabilities.boolean
getCenter()
Get whether to center (rather than standardize) the data.int
getMaxColumns()
Returns the maximum attributes.double
getVariance()
Returns the variance.String
globalInfo()
Returns a string describing the object.protected Dataset
initOutputFormat(Dataset data)
Initializes the output format.String
maxColumnsTipText()
Returns the tip text for this property.protected void
reset()
Resets the scheme.void
setCenter(boolean center)
Set whether to center (rather than standardize) the data.void
setMaxColumns(int value)
Sets the maximum attributes.void
setVariance(double value)
Sets the variance.String
varianceTipText()
Returns the tip text for this property.-
Methods inherited from class adams.ml.preprocessing.AbstractColumnSubsetBatchFilter
filter, initFilter, postInitFilter, preInitFilter
-
Methods inherited from class adams.ml.preprocessing.AbstractColumnSubsetFilter
colRangeTipText, colRegExpTipText, columnSubsetTipText, dropOtherColumnsTipText, getColRange, getColRegExp, getColumnSubset, getDropOtherColumns, initColumns, setColRange, setColRegExp, setColumnSubset, setDropOtherColumns
-
Methods inherited from class adams.ml.preprocessing.AbstractFilter
appendData, appendHeader, getOutputFormat, isInitialized
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.ml.preprocessing.ColumnSubsetFilter
getOutputFormat, isInitialized
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
-
-
-
Field Detail
-
m_Variance
protected double m_Variance
the variance to cover.
-
m_MaxColumns
protected int m_MaxColumns
the maximum number of attributes.
-
m_Center
protected boolean m_Center
whether to center (rather than standardize) the data and compute PCA from covariance (rather than correlation) matrix.
-
m_Algorithm
protected com.github.waikatodatamining.matrix.algorithm.PCA m_Algorithm
the actual algorithm.
-
m_NumColumns
protected int m_NumColumns
the number of columns that got determined.
-
m_Transformed
protected transient com.github.waikatodatamining.matrix.core.Matrix m_Transformed
temp matrix to avoid duplicate transformation.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceGlobalInfoSupporter
- Specified by:
globalInfo
in classAbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceOptionHandler
- Overrides:
defineOptions
in classAbstractColumnSubsetFilter
-
reset
protected void reset()
Resets the scheme.- Overrides:
reset
in classAbstractColumnSubsetFilter
-
setVariance
public void setVariance(double value)
Sets the variance.- Parameters:
value
- the variance
-
getVariance
public double getVariance()
Returns the variance.- Returns:
- the variance
-
varianceTipText
public String varianceTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setMaxColumns
public void setMaxColumns(int value)
Sets the maximum attributes.- Parameters:
value
- the maximum
-
getMaxColumns
public int getMaxColumns()
Returns the maximum attributes.- Returns:
- the maximum
-
maxColumnsTipText
public String maxColumnsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setCenter
public void setCenter(boolean center)
Set whether to center (rather than standardize) the data. If set to true then PCA is computed from the covariance rather than correlation matrix.- Parameters:
center
- true if the data is to be centered rather than standardized
-
getCenter
public boolean getCenter()
Get whether to center (rather than standardize) the data. If true then PCA is computed from the covariance rather than correlation matrix.- Returns:
- true if the data is to be centered rather than standardized.
-
centerTipText
public String centerTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getCapabilities
public Capabilities getCapabilities()
Returns the capabilities.- Returns:
- the capabilities
-
doInitFilter
protected void doInitFilter(Dataset data) throws Exception
Filter-specific initialization.- Specified by:
doInitFilter
in classAbstractColumnSubsetBatchFilter
- Parameters:
data
- the data to initialize with- Throws:
Exception
- if initialization fails
-
initOutputFormat
protected Dataset initOutputFormat(Dataset data) throws Exception
Initializes the output format.- Specified by:
initOutputFormat
in classAbstractColumnSubsetBatchFilter
- Parameters:
data
- the output format- Throws:
Exception
- if initialization fails
-
-