Class ColumnSplitter

  • All Implemented Interfaces:
    Destroyable, GlobalInfoSupporter, LoggingLevelHandler, LoggingSupporter, OptionHandler, SizeOfHandler, Serializable

    public class ColumnSplitter
    extends AbstractSplitter
    Splits a dataset in two based on the columns selected by a column-finder. Selected columns go in the first dataset, and the rest go in the second.

    -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel)
        The logging level for outputting errors and debugging output.
        default: WARNING
     
    -column-finder <adams.data.weka.columnfinder.ColumnFinder> (property: columnFinder)
        Column-finder defining which attributes go into which dataset.
        default: adams.data.weka.columnfinder.NullFinder
     
    Author:
    Corey Sterling (csterlin at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • m_ColumnFinder

        protected ColumnFinder m_ColumnFinder
        Column-finder for selecting which attributes go in which dataset.
      • m_SourceLookup

        protected int[][] m_SourceLookup
        Mapping from the split attributes to their source in the original dataset.
    • Constructor Detail

      • ColumnSplitter

        public ColumnSplitter()
    • Method Detail

      • getColumnFinder

        public ColumnFinder getColumnFinder()
        Gets the column finder.
        Returns:
        The column finder.
      • setColumnFinder

        public void setColumnFinder​(ColumnFinder value)
        Sets the column finder.
        Parameters:
        value - The column finder.
      • columnFinderTipText

        public String columnFinderTipText()
        Gets the tip-text for the columnFinder option.
        Returns:
        The tip-text as a string.
      • check

        public String check​(weka.core.Instances dataset)
        Checks that the input data is correctly formatted for our purposes.
        Parameters:
        dataset - The dataset to check.
        Returns:
        Null if all okay, or an error message if not.
      • getUnselectedColumns

        protected int[] getUnselectedColumns​(int[] selectedColumns,
                                             int numColumns)
        Creates an int[] which contains the unselected columns. i.e. all column indices up to numColumns that aren't in selectedColumns.
        Parameters:
        selectedColumns - The columns to exclude from the array. Must be sorted.
        numColumns - The total number of columns.
        Returns:
        The array of columns not in selectedColumns.
      • getSelectedColumn

        protected int getSelectedColumn​(int[] selectedColumns,
                                        int index)
        Gets the column number of the selected column at the given index.
        Parameters:
        selectedColumns - The array of selected columns.
        index - The index of the column to get.
        Returns:
        The number of the selected column, or -1 if index out of range.
      • splitAttributes

        protected ArrayList<weka.core.Attribute>[] splitAttributes​(weka.core.Instances dataset)
        Creates the attribute lists for the two datasets resulting from this split.
        Parameters:
        dataset - The dataset being split.
        Returns:
        Two lists, the first containing the selected attributes, the second containing the rest.
      • newInstanceForDataset

        protected weka.core.Instance newInstanceForDataset​(weka.core.Instances dataset)
        Creates a new empty instance suited to the given dataset
        Parameters:
        dataset - The dataset to create the instance for.
        Returns:
        The created instance.
      • split

        public weka.core.Instances[] split​(weka.core.Instances dataset)
        Splits the given dataset into a number of other datasets. Should be implemented by sub-classes to perform actual splitting.
        Specified by:
        split in class AbstractSplitter
        Parameters:
        dataset - The dataset to split.
        Returns:
        An array of datasets resulting from the split.