Package adams.ml.data

Class DatasetView

    • Field Detail

      • m_Rows

        protected gnu.trove.list.array.TIntArrayList m_Rows
        the row subset to use (null for all).
      • m_RowArray

        protected int[] m_RowArray
        the row array.
      • m_Columns

        protected gnu.trove.list.array.TIntArrayList m_Columns
        the column subset to use (null for all).
      • m_ColumnArray

        protected int[] m_ColumnArray
        the column array.
      • m_Dataset

        protected Dataset m_Dataset
        the underlying spreadsheet.
      • m_HeaderRow

        protected HeaderRow m_HeaderRow
        the cached header row.
    • Constructor Detail

      • DatasetView

        public DatasetView()
        Initializes the view with a dummy dataset.
      • DatasetView

        public DatasetView​(Dataset dataset,
                           int[] rows,
                           int[] columns)
        Initializes the view.
        Parameters:
        dataset - the underlying dataset
        rows - the rows to use, null for all
        columns - the columns to use, null for all
    • Method Detail

      • assign

        public void assign​(SpreadSheet sheet)
        Uses this spreadsheet instead, performs no copy.
        Specified by:
        assign in interface SpreadSheet
        Parameters:
        sheet - the sheet to use
      • getDataRowClass

        public Class getDataRowClass()
        Returns the class used for rows.
        Specified by:
        getDataRowClass in interface SpreadSheet
        Returns:
        the class
      • newInstance

        public SpreadSheet newInstance()
        Returns a new instance.
        Specified by:
        newInstance in interface SpreadSheet
        Returns:
        the new instance, null if failed to create new instance
      • getHeader

        public Dataset getHeader()
        Returns the view with the same header and comments.
        Specified by:
        getHeader in interface Dataset
        Specified by:
        getHeader in interface SpreadSheet
        Returns:
        the spreadsheet
      • indexOfColumn

        public int indexOfColumn​(String name)
        Returns the index of the column using the specified name.
        Specified by:
        indexOfColumn in interface Dataset
        Parameters:
        name - the name of the column to locate
        Returns:
        the index, -1 if failed to locate
      • getActualRow

        protected int getActualRow​(int rowIndex)
        Returns the actual row index.
        Parameters:
        rowIndex - the row in the view
        Returns:
        the underlying row index
      • getActualColumn

        protected int getActualColumn​(int colIndex)
        Returns the actual column index.
        Parameters:
        colIndex - the col in the view
        Returns:
        the underlying col index
      • getActualRow

        protected String getActualRow​(String rowKey)
        Returns the actual row key.
        Parameters:
        rowKey - the row key in the view
        Returns:
        the underlying row key, null if not present
      • getActualColumn

        protected String getActualColumn​(String cellKey)
        Returns the actual cell key.
        Parameters:
        cellKey - the cell key in the view
        Returns:
        the underlying cell key, null if not present
      • wrap

        protected DataRowView wrap​(DataRow row)
        Wraps the data row in a view container.
        Parameters:
        row - the row to wrap
        Returns:
        the wrapped row
      • setName

        public void setName​(String value)
        Sets the name of the spreadsheet.
        Specified by:
        setName in interface SpreadSheet
        Parameters:
        value - the name
      • getName

        public String getName()
        Returns the name of the spreadsheet.
        Specified by:
        getName in interface SpreadSheet
        Returns:
        the name, can be null
      • hasName

        public boolean hasName()
        Returns whether the spreadsheet has a name.
        Specified by:
        hasName in interface SpreadSheet
        Returns:
        true if the spreadsheet is named
      • addComment

        public void addComment​(String comment)
        Adds the comment to the internal list of comments. If the comment contains newlines, then it gets automatically split into multiple lines and added one by one.
        Specified by:
        addComment in interface SpreadSheet
        Parameters:
        comment - the comment to add
      • addComment

        public void addComment​(List<String> comment)
        Adds the comments to the internal list of comments.
        Specified by:
        addComment in interface SpreadSheet
        Parameters:
        comment - the comment to add
      • clear

        public void clear()
        Removes all cells, but leaves comments.
        Not implemented!
        Specified by:
        clear in interface SpreadSheet
      • getColumnName

        public String getColumnName​(int colIndex)
        Returns the name of the specified column.
        Specified by:
        getColumnName in interface SpreadSheet
        Parameters:
        colIndex - the index of the column
        Returns:
        the name of the column
      • getColumnNames

        public List<String> getColumnNames()
        Returns a list of the names of all columns (i.e., the content the header row cells).
        Specified by:
        getColumnNames in interface SpreadSheet
        Returns:
        the names of the columns
      • hasRow

        public boolean hasRow​(int rowIndex)
        Returns whether the spreadsheet already contains the row with the given index.
        Specified by:
        hasRow in interface SpreadSheet
        Parameters:
        rowIndex - the index to look for
        Returns:
        true if the row already exists
      • hasRow

        public boolean hasRow​(String rowKey)
        Returns whether the spreadsheet already contains the row with the given key.
        Specified by:
        hasRow in interface SpreadSheet
        Parameters:
        rowKey - the key to look for
        Returns:
        true if the row already exists
      • newCell

        public Cell newCell()
        Creates a new cell.
        Specified by:
        newCell in interface SpreadSheet
        Returns:
        the new instance, null in case of an instantiation error
      • addRow

        public DataRow addRow()
        Appends a row to the spreadsheet.
        Not implemented!
        Specified by:
        addRow in interface SpreadSheet
        Returns:
        the created row
      • addRow

        public DataRow addRow​(String rowKey)
        Adds a row with the given key to the list and returns the created object. If the row already exists, then this row is returned instead and no new object created.
        Not implemented!
        Specified by:
        addRow in interface SpreadSheet
        Parameters:
        rowKey - the key for the row to create
        Returns:
        the created row or the already existing row
      • insertRow

        public DataRow insertRow​(int index)
        Inserts a row at the specified location.
        Not implemented!
        Specified by:
        insertRow in interface SpreadSheet
        Parameters:
        index - the index where to insert the row
        Returns:
        the created row
      • removeRow

        public Row removeRow​(int rowIndex)
        Removes the specified row.
        Not implemented!
        Specified by:
        removeRow in interface SpreadSheet
        Parameters:
        rowIndex - the row to remove
        Returns:
        the row that was removed, null if none removed
      • removeRow

        public Row removeRow​(String rowKey)
        Removes the specified row.
        Not implemented!
        Specified by:
        removeRow in interface SpreadSheet
        Parameters:
        rowKey - the row to remove
        Returns:
        the row that was removed, null if none removed
      • insertColumn

        public void insertColumn​(int columnIndex,
                                 String header)
        Inserts a column at the specified location.
        Not implemented!
        Specified by:
        insertColumn in interface SpreadSheet
        Parameters:
        columnIndex - the position of the column
        header - the name of the column
      • insertColumn

        public void insertColumn​(int columnIndex,
                                 String header,
                                 String initial)
        Inserts a column at the specified location.
        Not implemented!
        Specified by:
        insertColumn in interface SpreadSheet
        Parameters:
        columnIndex - the position of the column
        header - the name of the column
        initial - the initial value for the cells, "null" for missing values (in that case no cells are added)
      • insertColumn

        public void insertColumn​(int columnIndex,
                                 String header,
                                 String initial,
                                 boolean forceString)
        Inserts a column at the specified location.
        Not implemented!
        Specified by:
        insertColumn in interface SpreadSheet
        Parameters:
        columnIndex - the position of the column
        header - the name of the column
        initial - the initial value for the cells, "null" for missing values (in that case no cells are added)
        forceString - whether to enforce the value to be set as string
      • removeColumn

        public boolean removeColumn​(int columnIndex)
        Removes the specified column.
        Not implemented!
        Specified by:
        removeColumn in interface SpreadSheet
        Parameters:
        columnIndex - the column to remove
        Returns:
        true if removed
      • removeColumn

        public boolean removeColumn​(String columnKey)
        Removes the specified column.
        Not implemented!
        Specified by:
        removeColumn in interface SpreadSheet
        Parameters:
        columnKey - the column to remove
        Returns:
        true if removed
      • getRow

        public DataRow getRow​(String rowKey)
        Returns the row associated with the given row key, null if not found.
        Specified by:
        getRow in interface SpreadSheet
        Parameters:
        rowKey - the key of the row to retrieve
        Returns:
        the row or null if not found
      • getRow

        public DataRow getRow​(int rowIndex)
        Returns the row at the specified index.
        Specified by:
        getRow in interface SpreadSheet
        Parameters:
        rowIndex - the 0-based index of the row to retrieve
        Returns:
        the row
      • getRowKey

        public String getRowKey​(int rowIndex)
        Returns the row key at the specified index.
        Specified by:
        getRowKey in interface SpreadSheet
        Parameters:
        rowIndex - the 0-based index of the row key to retrieve
        Returns:
        the row key
      • getRowIndex

        public int getRowIndex​(String rowKey)
        Returns the row index of the specified row.
        Specified by:
        getRowIndex in interface SpreadSheet
        Parameters:
        rowKey - the row identifier
        Returns:
        the 0-based row index, -1 if not found
      • getCellIndex

        public int getCellIndex​(String cellKey)
        Returns the cell index of the specified cell (in the header row).
        Specified by:
        getCellIndex in interface SpreadSheet
        Parameters:
        cellKey - the cell identifier
        Returns:
        the 0-based column index, -1 if not found
      • hasCell

        public boolean hasCell​(int rowIndex,
                               int columnIndex)
        Checks whether the cell with the given indices already exists.
        Specified by:
        hasCell in interface SpreadSheet
        Parameters:
        rowIndex - the index of the row to look for
        columnIndex - the index of the cell in the row to look for
        Returns:
        true if the cell exists
      • getCell

        public Cell getCell​(int rowIndex,
                            int columnIndex)
        Returns the corresponding cell or null if not found.
        Specified by:
        getCell in interface SpreadSheet
        Parameters:
        rowIndex - the index of the row the cell is in
        columnIndex - the column of the cell to retrieve
        Returns:
        the cell or null if not found
      • getCellPosition

        public String getCellPosition​(String rowKey,
                                      String cellKey)
        Returns the position of the cell or null if not found. A position is a combination of a number of letters (for the column) and number (for the row).
        Specified by:
        getCellPosition in interface SpreadSheet
        Parameters:
        rowKey - the key of the row the cell is in
        cellKey - the key of the cell to retrieve
        Returns:
        the position string or null if not found
      • sortRowKeys

        public void sortRowKeys()
        Sorts the rows according to the row keys.
        Not implemented!
        Specified by:
        sortRowKeys in interface SpreadSheet
        See Also:
        rowKeys()
      • sortRowKeys

        public void sortRowKeys​(Comparator<String> comp)
        Sorts the rows according to the row keys.
        Not implemented!
        Specified by:
        sortRowKeys in interface SpreadSheet
        Parameters:
        comp - the comparator to use
        See Also:
        rowKeys()
      • sort

        public void sort​(int index,
                         boolean asc)
        Sorts the rows based on the values in the specified column.
        Not implemented!
        Specified by:
        sort in interface SpreadSheet
        Parameters:
        index - the index (0-based) of the column to sort on
        asc - wether sorting is ascending or descending
        See Also:
        sort(RowComparator)
      • sort

        public void sort​(RowComparator comp)
        Sorts the rows using the given comparator.
        Not implemented!
        Specified by:
        sort in interface SpreadSheet
        Parameters:
        comp - the row comparator to use
      • sort

        public void sort​(RowComparator comp,
                         boolean unique)
        Sorts the rows using the given comparator.
        Not implemented!
        Specified by:
        sort in interface SpreadSheet
        Parameters:
        comp - the row comparator to use
        unique - whether to drop any duplicate rows (based on row comparator)
      • getColumnCount

        public int getColumnCount()
        Returns the number of columns.
        Specified by:
        getColumnCount in interface SpreadSheet
        Returns:
        the number of columns
      • getRowCount

        public int getRowCount()
        Returns the number of rows currently stored.
        Specified by:
        getRowCount in interface SpreadSheet
        Returns:
        the number of rows
      • isNumeric

        public boolean isNumeric​(int columnIndex)
        Checks whether the given column is numeric or not. Does not accept missing values.
        Specified by:
        isNumeric in interface SpreadSheet
        Parameters:
        columnIndex - the index of the column to check
        Returns:
        true if purely numeric
        See Also:
        getContentTypes(int)
      • isNumeric

        public boolean isNumeric​(int columnIndex,
                                 boolean allowMissing)
        Checks whether the given column is numeric or not. Can accept missing values.
        Specified by:
        isNumeric in interface SpreadSheet
        Parameters:
        columnIndex - the index of the column to check
        Returns:
        true if purely numeric
        See Also:
        getContentTypes(int)
      • isContentType

        public boolean isContentType​(int columnIndex,
                                     Cell.ContentType type)
        Checks whether the given column is of the specific content type or not.
        Specified by:
        isContentType in interface SpreadSheet
        Parameters:
        columnIndex - the index of the column to check
        type - the content type to check
        Returns:
        true if column purely consists of this content type
        See Also:
        getContentType(int)
      • getContentType

        public Cell.ContentType getContentType​(int columnIndex)
        Returns the pure content type of the given column, if available.
        Specified by:
        getContentType in interface SpreadSheet
        Parameters:
        columnIndex - the index of the column to check
        Returns:
        the content type that this column consists of solely, null if mixed
      • getContentTypes

        public Collection<Cell.ContentType> getContentTypes​(int columnIndex)
        Returns the all content types of the given column, if available.
        Specified by:
        getContentTypes in interface SpreadSheet
        Parameters:
        columnIndex - the index of the column to check
        Returns:
        the content types that this column consists of
      • getCellValues

        public List<String> getCellValues​(String colKey)
        Returns the unique string values of the specified column. The returned list is sorted.
        Specified by:
        getCellValues in interface SpreadSheet
        Parameters:
        colKey - the column to retrieve the values for
        Returns:
        the sorted, list of unique values
      • getCellValues

        public List<String> getCellValues​(int colIndex)
        Returns the unique string values of the specified column. The returned list is sorted.
        Specified by:
        getCellValues in interface SpreadSheet
        Parameters:
        colIndex - the column to retrieve the values for
        Returns:
        the sorted, list of unique values
      • equalsHeader

        public String equalsHeader​(SpreadSheet other)
        Compares the header of this spreadsheet with the other one.
        Specified by:
        equalsHeader in interface SpreadSheet
        Parameters:
        other - the other spreadsheet to compare with
        Returns:
        null if equal, otherwise details what differs
      • removeMissing

        public boolean removeMissing()
        Removes all cells marked "missing".
        Not implemented!
        Specified by:
        removeMissing in interface SpreadSheet
        Returns:
        true if any cell was removed
      • setDateLenient

        public void setDateLenient​(boolean value)
        Sets whether parsing of dates is to be lenient or not.
        Specified by:
        setDateLenient in interface SpreadSheet
        Parameters:
        value - if true lenient parsing is used, otherwise not
        See Also:
        DateFormat.setLenient(boolean)
      • setDateTimeLenient

        public void setDateTimeLenient​(boolean value)
        Sets whether parsing of date/times is to be lenient or not.
        Specified by:
        setDateTimeLenient in interface SpreadSheet
        Parameters:
        value - if true lenient parsing is used, otherwise not
        See Also:
        DateFormat.setLenient(boolean)
      • setDateTimeMsecLenient

        public void setDateTimeMsecLenient​(boolean value)
        Sets whether parsing of date/time mses is to be lenient or not.
        Specified by:
        setDateTimeMsecLenient in interface SpreadSheet
        Parameters:
        value - if true lenient parsing is used, otherwise not
        See Also:
        DateFormat.setLenient(boolean)
      • setTimeLenient

        public void setTimeLenient​(boolean value)
        Sets whether parsing of times is to be lenient or not.
        Specified by:
        setTimeLenient in interface SpreadSheet
        Parameters:
        value - if true lenient parsing is used, otherwise not
      • isTimeLenient

        public boolean isTimeLenient()
        Returns whether the parsing of times is lenient or not.
        Specified by:
        isTimeLenient in interface SpreadSheet
        Returns:
        true if parsing is lenient
      • setTimeMsecLenient

        public void setTimeMsecLenient​(boolean value)
        Sets whether parsing of times/msec is to be lenient or not.
        Specified by:
        setTimeMsecLenient in interface SpreadSheet
        Parameters:
        value - if true lenient parsing is used, otherwise not
      • isTimeMsecLenient

        public boolean isTimeMsecLenient()
        Returns whether the parsing of times/msec is lenient or not.
        Specified by:
        isTimeMsecLenient in interface SpreadSheet
        Returns:
        true if parsing is lenient
      • setLocale

        public void setLocale​(Locale value)
        Sets the locale. Used in formatting/parsing numbers.
        Specified by:
        setLocale in interface LocaleSupporter
        Specified by:
        setLocale in interface SpreadSheet
        Parameters:
        value - the locale to use
      • calculate

        public void calculate()
        Triggers all formula cells to recalculate their values.
        Specified by:
        calculate in interface SpreadSheet
      • getDataset

        public Dataset getDataset()
        Returns the underlying dataset.
        Returns:
        the underlying dataset
      • removeClassAttributes

        public void removeClassAttributes()
        Removes all set class attributes.
        Not implemented!
        Specified by:
        removeClassAttributes in interface Dataset
      • isClassAttribute

        public boolean isClassAttribute​(String colKey)
        Returns whether the specified column is a class attribute.
        Specified by:
        isClassAttribute in interface Dataset
        Parameters:
        colKey - they key of the column to query
        Returns:
        true if column a class attribute
      • isClassAttribute

        public boolean isClassAttribute​(int colIndex)
        Returns whether the specified column is a class attribute.
        Specified by:
        isClassAttribute in interface Dataset
        Parameters:
        colIndex - they index of the column to query
        Returns:
        true if column a class attribute
      • isClassAttributeByName

        public boolean isClassAttributeByName​(String name)
        Returns whether the specified column is a class attribute.
        Specified by:
        isClassAttributeByName in interface Dataset
        Parameters:
        name - they name of the column to query
        Returns:
        true if column a class attribute
      • setClassAttribute

        public boolean setClassAttribute​(String colKey,
                                         boolean isClass)
        Sets the class attribute status for a column.
        Not implemented!
        Specified by:
        setClassAttribute in interface Dataset
        Parameters:
        colKey - the column to set the class attribute status for
        isClass - if true then the column will be flagged as class attribute, otherwise the flag will get removed
        Returns:
        true if successfully updated
      • setClassAttribute

        public boolean setClassAttribute​(int colIndex,
                                         boolean isClass)
        Sets the class attribute status for a column.
        Not implemented!
        Specified by:
        setClassAttribute in interface Dataset
        Parameters:
        colIndex - the column to set the class attribute status for
        isClass - if true then the column will be flagged as class attribute, otherwise the flag will get removed
        Returns:
        true if successfully updated
      • setClassAttributeByName

        public boolean setClassAttributeByName​(String name,
                                               boolean isClass)
        Sets the class attribute status for a column.
        Not implemented!
        Specified by:
        setClassAttributeByName in interface Dataset
        Parameters:
        name - the name of the column to set the class attribute status for
        isClass - if true then the column will be flagged as class attribute, otherwise the flag will get removed
        Returns:
        true if successfully updated
      • getClassAttributeKeys

        public String[] getClassAttributeKeys()
        Returns all the class attributes that are currently set.
        Specified by:
        getClassAttributeKeys in interface Dataset
        Returns:
        the column keys of class attributes (not ordered)
      • getClassAttributeNames

        public String[] getClassAttributeNames()
        Returns all the class attributes that are currently set.
        Specified by:
        getClassAttributeNames in interface Dataset
        Returns:
        the column names of class attributes (not ordered)
      • getClassAttributeIndices

        public int[] getClassAttributeIndices()
        Returns all the class attributes that are currently set.
        Specified by:
        getClassAttributeIndices in interface Dataset
        Returns:
        the indices of class attributes (sorted asc)
      • getInputs

        public SpreadSheet getInputs()
        Returns a spreadsheet containing only the input columns, not class columns.
        Specified by:
        getInputs in interface Dataset
        Returns:
        the input features, null if data conists only of class columns
      • getOutputs

        public SpreadSheet getOutputs()
        Returns a spreadsheet containing only output columns, i.e., the class columns.
        Specified by:
        getOutputs in interface Dataset
        Returns:
        the output features, null if data has no class columns
      • toMatrix

        public Object[][] toMatrix()
        Returns the spreadsheet as matrix, with the header as the first row. Missing values are represented as null values.
        Specified by:
        toMatrix in interface SpreadSheet
        Returns:
        the row-wise matrix
      • toString

        public String toString()
        Returns the spreadsheet as string, i.e., CSV formatted.
        Specified by:
        toString in interface SpreadSheet
        Overrides:
        toString in class Object
        Returns:
        the string representation
      • toView

        public DatasetView toView​(int[] rows,
                                  int[] columns)
        Creates a view of the spreadsheet with the specified rows/columns.
        Specified by:
        toView in interface SpreadSheet
        Parameters:
        columns - the columns to use, null for all
        rows - the rows to use, null for all
        Returns:
        the view