Class PCA
-
- All Implemented Interfaces:
Destroyable,GlobalInfoSupporter,LoggingLevelHandler,LoggingSupporter,OptionHandler,SizeOfHandler,CapabilitiesHandler,BatchFilter,ColumnSubsetFilter,Filter,Serializable
public class PCA extends AbstractColumnSubsetBatchFilter
Performs principal components analysis.
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-column-subset <RANGE|REGEXP> (property: columnSubset) Defines how to determine the columns to use for filtering. default: RANGE
-col-range <adams.data.spreadsheet.SpreadSheetColumnRange> (property: colRange) The range of columns to use in the filtering process. default: first-last example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; column names (case-sensitive) as well as the following placeholders can be used: first, second, third, last_2, last_1, last; numeric indices can be enforced by preceding them with '#' (eg '#12'); column names can be surrounded by double quotes.-col-regexp <adams.core.base.BaseRegExp> (property: colRegExp) The regular expression to use on the column names to determine whether to use a column for filtering. default: .* more: https://docs.oracle.com/javase/tutorial/essential/regex/ https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
-drop-other-columns <boolean> (property: dropOtherColumns) If enabled, other columns that aren't used for filtering get removed from the output; does not affect any class columns. default: false
-variance <double> (property: variance) The variance to cover. default: 0.95 minimum: 0.0 maximum: 1.0
-max-columns <int> (property: maxColumns) The maximum number of columns to generate. default: -1 minimum: -1
-center <boolean> (property: center) If enabled, the data gets centered rather than standardized, computing PCA from covariance matrix rather than correlation matrix. default: false
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected com.github.waikatodatamining.matrix.algorithm.PCAm_Algorithmthe actual algorithm.protected booleanm_Centerwhether to center (rather than standardize) the data and compute PCA from covariance (rather than correlation) matrix.protected intm_MaxColumnsthe maximum number of attributes.protected intm_NumColumnsthe number of columns that got determined.protected com.github.waikatodatamining.matrix.core.Matrixm_Transformedtemp matrix to avoid duplicate transformation.protected doublem_Variancethe variance to cover.-
Fields inherited from class adams.ml.preprocessing.AbstractColumnSubsetFilter
m_ClassColumns, m_ColRange, m_ColRegExp, m_ColumnSubset, m_DataColumns, m_DropOtherColumns, m_OtherColumns
-
Fields inherited from class adams.ml.preprocessing.AbstractFilter
m_Initialized, m_OutputFormat
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description PCA()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description StringcenterTipText()Returns the tip text for this property.voiddefineOptions()Adds options to the internal list of options.protected DatasetdoFilter(Dataset data)Filters the dataset coming through.protected voiddoInitFilter(Dataset data)Filter-specific initialization.CapabilitiesgetCapabilities()Returns the capabilities.booleangetCenter()Get whether to center (rather than standardize) the data.intgetMaxColumns()Returns the maximum attributes.doublegetVariance()Returns the variance.StringglobalInfo()Returns a string describing the object.protected DatasetinitOutputFormat(Dataset data)Initializes the output format.StringmaxColumnsTipText()Returns the tip text for this property.protected voidreset()Resets the scheme.voidsetCenter(boolean center)Set whether to center (rather than standardize) the data.voidsetMaxColumns(int value)Sets the maximum attributes.voidsetVariance(double value)Sets the variance.StringvarianceTipText()Returns the tip text for this property.-
Methods inherited from class adams.ml.preprocessing.AbstractColumnSubsetBatchFilter
filter, initFilter, postInitFilter, preInitFilter
-
Methods inherited from class adams.ml.preprocessing.AbstractColumnSubsetFilter
colRangeTipText, colRegExpTipText, columnSubsetTipText, dropOtherColumnsTipText, getColRange, getColRegExp, getColumnSubset, getDropOtherColumns, initColumns, setColRange, setColRegExp, setColumnSubset, setDropOtherColumns
-
Methods inherited from class adams.ml.preprocessing.AbstractFilter
appendData, appendHeader, getOutputFormat, isInitialized
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.ml.preprocessing.ColumnSubsetFilter
getOutputFormat, isInitialized
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
-
-
-
Field Detail
-
m_Variance
protected double m_Variance
the variance to cover.
-
m_MaxColumns
protected int m_MaxColumns
the maximum number of attributes.
-
m_Center
protected boolean m_Center
whether to center (rather than standardize) the data and compute PCA from covariance (rather than correlation) matrix.
-
m_Algorithm
protected com.github.waikatodatamining.matrix.algorithm.PCA m_Algorithm
the actual algorithm.
-
m_NumColumns
protected int m_NumColumns
the number of columns that got determined.
-
m_Transformed
protected transient com.github.waikatodatamining.matrix.core.Matrix m_Transformed
temp matrix to avoid duplicate transformation.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfoin interfaceGlobalInfoSupporter- Specified by:
globalInfoin classAbstractOptionHandler- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptionsin interfaceOptionHandler- Overrides:
defineOptionsin classAbstractColumnSubsetFilter
-
reset
protected void reset()
Resets the scheme.- Overrides:
resetin classAbstractColumnSubsetFilter
-
setVariance
public void setVariance(double value)
Sets the variance.- Parameters:
value- the variance
-
getVariance
public double getVariance()
Returns the variance.- Returns:
- the variance
-
varianceTipText
public String varianceTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setMaxColumns
public void setMaxColumns(int value)
Sets the maximum attributes.- Parameters:
value- the maximum
-
getMaxColumns
public int getMaxColumns()
Returns the maximum attributes.- Returns:
- the maximum
-
maxColumnsTipText
public String maxColumnsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setCenter
public void setCenter(boolean center)
Set whether to center (rather than standardize) the data. If set to true then PCA is computed from the covariance rather than correlation matrix.- Parameters:
center- true if the data is to be centered rather than standardized
-
getCenter
public boolean getCenter()
Get whether to center (rather than standardize) the data. If true then PCA is computed from the covariance rather than correlation matrix.- Returns:
- true if the data is to be centered rather than standardized.
-
centerTipText
public String centerTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getCapabilities
public Capabilities getCapabilities()
Returns the capabilities.- Returns:
- the capabilities
-
doInitFilter
protected void doInitFilter(Dataset data) throws Exception
Filter-specific initialization.- Specified by:
doInitFilterin classAbstractColumnSubsetBatchFilter- Parameters:
data- the data to initialize with- Throws:
Exception- if initialization fails
-
initOutputFormat
protected Dataset initOutputFormat(Dataset data) throws Exception
Initializes the output format.- Specified by:
initOutputFormatin classAbstractColumnSubsetBatchFilter- Parameters:
data- the output format- Throws:
Exception- if initialization fails
-
-