Package adams.data.weka.datasetsplitter
Class ColumnSplitter
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.data.weka.datasetsplitter.AbstractSplitter
-
- adams.data.weka.datasetsplitter.ColumnSplitter
-
- All Implemented Interfaces:
adams.core.Destroyable
,adams.core.GlobalInfoSupporter
,adams.core.logging.LoggingLevelHandler
,adams.core.logging.LoggingSupporter
,adams.core.option.OptionHandler
,adams.core.SizeOfHandler
,Serializable
public class ColumnSplitter extends AbstractSplitter
Splits a dataset in two based on the columns selected by a column-finder. Selected columns go in the first dataset, and the rest go in the second.
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-column-finder <adams.data.weka.columnfinder.ColumnFinder> (property: columnFinder) Column-finder defining which attributes go into which dataset. default: adams.data.weka.columnfinder.NullFinder
- Author:
- Corey Sterling (csterlin at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected ColumnFinder
m_ColumnFinder
Column-finder for selecting which attributes go in which dataset.protected int[][]
m_SourceLookup
Mapping from the split attributes to their source in the original dataset.
-
Constructor Summary
Constructors Constructor Description ColumnSplitter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
check(weka.core.Instances dataset)
Checks that the input data is correctly formatted for our purposes.String
columnFinderTipText()
Gets the tip-text for the columnFinder option.void
defineOptions()
Adds options to the internal list of options.ColumnFinder
getColumnFinder()
Gets the column finder.protected int
getSelectedColumn(int[] selectedColumns, int index)
Gets the column number of the selected column at the given index.protected int[]
getUnselectedColumns(int[] selectedColumns, int numColumns)
Creates an int[] which contains the unselected columns.String
globalInfo()
Returns a string describing the object.protected weka.core.Instance
newInstanceForDataset(weka.core.Instances dataset)
Creates a new empty instance suited to the given datasetvoid
setColumnFinder(ColumnFinder value)
Sets the column finder.weka.core.Instances[]
split(weka.core.Instances dataset)
Splits the given dataset into a number of other datasets.protected ArrayList<weka.core.Attribute>[]
splitAttributes(weka.core.Instances dataset)
Creates the attribute lists for the two datasets resulting from this split.-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
-
-
-
Field Detail
-
m_ColumnFinder
protected ColumnFinder m_ColumnFinder
Column-finder for selecting which attributes go in which dataset.
-
m_SourceLookup
protected int[][] m_SourceLookup
Mapping from the split attributes to their source in the original dataset.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceadams.core.GlobalInfoSupporter
- Specified by:
globalInfo
in classadams.core.option.AbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options. Derived classes must override this method to add additional options.- Specified by:
defineOptions
in interfaceadams.core.option.OptionHandler
- Overrides:
defineOptions
in classadams.core.option.AbstractOptionHandler
-
getColumnFinder
public ColumnFinder getColumnFinder()
Gets the column finder.- Returns:
- The column finder.
-
setColumnFinder
public void setColumnFinder(ColumnFinder value)
Sets the column finder.- Parameters:
value
- The column finder.
-
columnFinderTipText
public String columnFinderTipText()
Gets the tip-text for the columnFinder option.- Returns:
- The tip-text as a string.
-
check
public String check(weka.core.Instances dataset)
Checks that the input data is correctly formatted for our purposes.- Parameters:
dataset
- The dataset to check.- Returns:
- Null if all okay, or an error message if not.
-
getUnselectedColumns
protected int[] getUnselectedColumns(int[] selectedColumns, int numColumns)
Creates an int[] which contains the unselected columns. i.e. all column indices up to numColumns that aren't in selectedColumns.- Parameters:
selectedColumns
- The columns to exclude from the array. Must be sorted.numColumns
- The total number of columns.- Returns:
- The array of columns not in selectedColumns.
-
getSelectedColumn
protected int getSelectedColumn(int[] selectedColumns, int index)
Gets the column number of the selected column at the given index.- Parameters:
selectedColumns
- The array of selected columns.index
- The index of the column to get.- Returns:
- The number of the selected column, or -1 if index out of range.
-
splitAttributes
protected ArrayList<weka.core.Attribute>[] splitAttributes(weka.core.Instances dataset)
Creates the attribute lists for the two datasets resulting from this split.- Parameters:
dataset
- The dataset being split.- Returns:
- Two lists, the first containing the selected attributes, the second containing the rest.
-
newInstanceForDataset
protected weka.core.Instance newInstanceForDataset(weka.core.Instances dataset)
Creates a new empty instance suited to the given dataset- Parameters:
dataset
- The dataset to create the instance for.- Returns:
- The created instance.
-
split
public weka.core.Instances[] split(weka.core.Instances dataset)
Splits the given dataset into a number of other datasets. Should be implemented by sub-classes to perform actual splitting.- Specified by:
split
in classAbstractSplitter
- Parameters:
dataset
- The dataset to split.- Returns:
- An array of datasets resulting from the split.
-
-