Package adams.ml.data

Class InstancesView

    • Field Detail

      • m_Data

        protected weka.core.Instances m_Data
        the underlying data.
      • m_SharedStringsTable

        protected SharedStringsTable m_SharedStringsTable
        the shared string table.
    • Constructor Detail

      • InstancesView

        public InstancesView()
        Initializes the view with a dummy dataset.
      • InstancesView

        public InstancesView​(weka.core.Instances data)
        Initializes the view.
        Parameters:
        data - the data to use
    • Method Detail

      • createDummy

        protected static weka.core.Instances createDummy()
        Returns a dummy dataset.
        Returns:
        the dataset
      • getData

        public weka.core.Instances getData()
        Returns the underlying Instances.
        Returns:
        the underlying data
      • rowKeyToIndex

        protected int rowKeyToIndex​(String rowKey)
        Turns the rowKey into a row index.
        Parameters:
        rowKey - the rowKey to convert
        Returns:
        the row index, -1 if failed to convert
      • cellKeyToIndex

        protected int cellKeyToIndex​(String cellKey)
        Turns the cellKey into a column index.
        Parameters:
        cellKey - the cellKey to convert
        Returns:
        the column index, -1 if failed to convert
      • addComment

        public void addComment​(List<String> comment)
        Ignored.
        Specified by:
        addComment in interface SpreadSheet
        Parameters:
        comment - the comment to add
      • clear

        public void clear()
        Removes all cells, but leaves comments.
        Specified by:
        clear in interface SpreadSheet
      • getColumnName

        public String getColumnName​(int colIndex)
        Returns the name of the specified column.
        Specified by:
        getColumnName in interface SpreadSheet
        Parameters:
        colIndex - the index of the column
        Returns:
        the name of the column
      • getColumnNames

        public List<String> getColumnNames()
        Returns a list of the names of all columns (i.e., the content the header row cells).
        Specified by:
        getColumnNames in interface SpreadSheet
        Returns:
        the names of the columns
      • hasRow

        public boolean hasRow​(int rowIndex)
        Returns whether the spreadsheet already contains the row with the given index.
        Specified by:
        hasRow in interface SpreadSheet
        Parameters:
        rowIndex - the index to look for
        Returns:
        true if the row already exists
      • hasRow

        public boolean hasRow​(String rowKey)
        Returns whether the spreadsheet already contains the row with the given key.
        Specified by:
        hasRow in interface SpreadSheet
        Parameters:
        rowKey - the key to look for
        Returns:
        true if the row already exists
      • newCell

        public Cell newCell()
        Creates a new cell.
        Specified by:
        newCell in interface SpreadSheet
        Returns:
        the new instance, null in case of an instantiation error
      • addRow

        public DataRow addRow()
        Appends a row to the spreadsheet.
        Specified by:
        addRow in interface SpreadSheet
        Returns:
        the created row
      • addRow

        public DataRow addRow​(String rowKey)
        Adds a row with the given key to the list and returns the created object. If the row already exists, then this row is returned instead and no new object created.
        Specified by:
        addRow in interface SpreadSheet
        Parameters:
        rowKey - the key for the row to create
        Returns:
        the created row or the already existing row
      • insertRow

        public DataRow insertRow​(int index)
        Inserts a row at the specified location.
        Specified by:
        insertRow in interface SpreadSheet
        Parameters:
        index - the index where to insert the row
        Returns:
        the created row
      • removeRow

        public Row removeRow​(int rowIndex)
        Removes the specified row.
        Specified by:
        removeRow in interface SpreadSheet
        Parameters:
        rowIndex - the row to remove
        Returns:
        the row that was removed, null if none removed
      • removeRow

        public Row removeRow​(String rowKey)
        Removes the specified row.
        Specified by:
        removeRow in interface SpreadSheet
        Parameters:
        rowKey - the row to remove
        Returns:
        the row that was removed, null if none removed
      • insertColumn

        public void insertColumn​(int columnIndex,
                                 String header)
        Inserts a column at the specified location.

        Not implemented!
        Specified by:
        insertColumn in interface SpreadSheet
        Parameters:
        columnIndex - the position of the column
        header - the name of the column
      • insertColumn

        public void insertColumn​(int columnIndex,
                                 String header,
                                 String initial)
        Inserts a column at the specified location.

        Not implemented!
        Specified by:
        insertColumn in interface SpreadSheet
        Parameters:
        columnIndex - the position of the column
        header - the name of the column
        initial - the initial value for the cells, "null" for missing values (in that case no cells are added)
      • insertColumn

        public void insertColumn​(int columnIndex,
                                 String header,
                                 String initial,
                                 boolean forceString)
        Inserts a column at the specified location.

        Not implemented!
        Specified by:
        insertColumn in interface SpreadSheet
        Parameters:
        columnIndex - the position of the column
        header - the name of the column
        initial - the initial value for the cells, "null" for missing values (in that case no cells are added)
        forceString - whether to enforce the value to be set as string
      • removeColumn

        public boolean removeColumn​(int columnIndex)
        Removes the specified column.

        Not implemented!
        Specified by:
        removeColumn in interface SpreadSheet
        Parameters:
        columnIndex - the column to remove
        Returns:
        true if removed
      • removeColumn

        public boolean removeColumn​(String columnKey)
        Removes the specified column.

        Not implemented!
        Specified by:
        removeColumn in interface SpreadSheet
        Parameters:
        columnKey - the column to remove
        Returns:
        true if removed
      • getRow

        public DataRow getRow​(String rowKey)
        Returns the row associated with the given row key, null if not found.
        Specified by:
        getRow in interface SpreadSheet
        Parameters:
        rowKey - the key of the row to retrieve
        Returns:
        the row or null if not found
      • getRow

        public DataRow getRow​(int rowIndex)
        Returns the row at the specified index.
        Specified by:
        getRow in interface SpreadSheet
        Parameters:
        rowIndex - the 0-based index of the row to retrieve
        Returns:
        the row
      • getRowKey

        public String getRowKey​(int rowIndex)
        Returns the row key at the specified index.
        Specified by:
        getRowKey in interface SpreadSheet
        Parameters:
        rowIndex - the 0-based index of the row key to retrieve
        Returns:
        the row key
      • getRowIndex

        public int getRowIndex​(String rowKey)
        Returns the row index of the specified row.
        Specified by:
        getRowIndex in interface SpreadSheet
        Parameters:
        rowKey - the row identifier
        Returns:
        the 0-based row index, -1 if not found
      • getCellIndex

        public int getCellIndex​(String cellKey)
        Returns the cell index of the specified cell (in the header row).
        Specified by:
        getCellIndex in interface SpreadSheet
        Parameters:
        cellKey - the cell identifier
        Returns:
        the 0-based column index, -1 if not found
      • hasCell

        public boolean hasCell​(int rowIndex,
                               int columnIndex)
        Checks whether the cell with the given indices already exists.
        Specified by:
        hasCell in interface SpreadSheet
        Parameters:
        rowIndex - the index of the row to look for
        columnIndex - the index of the cell in the row to look for
        Returns:
        true if the cell exists
      • getCell

        public Cell getCell​(int rowIndex,
                            int columnIndex)
        Returns the corresponding cell or null if not found.
        Specified by:
        getCell in interface SpreadSheet
        Parameters:
        rowIndex - the index of the row the cell is in
        columnIndex - the column of the cell to retrieve
        Returns:
        the cell or null if not found
      • getCellPosition

        public String getCellPosition​(String rowKey,
                                      String cellKey)
        Returns the position of the cell or null if not found. A position is a combination of a number of letters (for the column) and number (for the row).
        Specified by:
        getCellPosition in interface SpreadSheet
        Parameters:
        rowKey - the key of the row the cell is in
        cellKey - the key of the cell to retrieve
        Returns:
        the position string or null if not found
      • sortRowKeys

        public void sortRowKeys()
        Sorts the rows according to the row keys.
        Does nothing.
        Specified by:
        sortRowKeys in interface SpreadSheet
        See Also:
        rowKeys()
      • sortRowKeys

        public void sortRowKeys​(Comparator<String> comp)
        Sorts the rows according to the row keys.
        Does nothing.
        Specified by:
        sortRowKeys in interface SpreadSheet
        Parameters:
        comp - the comparator to use
        See Also:
        rowKeys()
      • sort

        public void sort​(int index,
                         boolean asc)
        Sorts the rows based on the values in the specified column.

        NB: the row keys will change!
        Specified by:
        sort in interface SpreadSheet
        Parameters:
        index - the index (0-based) of the column to sort on
        asc - wether sorting is ascending or descending
        See Also:
        sort(RowComparator)
      • sort

        public void sort​(RowComparator comp)
        Sorts the rows using the given comparator.

        Not implemented.
        Specified by:
        sort in interface SpreadSheet
        Parameters:
        comp - the row comparator to use
      • sort

        public void sort​(RowComparator comp,
                         boolean unique)
        Sorts the rows using the given comparator.

        Not implemented.
        Specified by:
        sort in interface SpreadSheet
        Parameters:
        comp - the row comparator to use
        unique - whether to drop any duplicate rows (based on row comparator)
      • getColumnCount

        public int getColumnCount()
        Returns the number of columns.
        Specified by:
        getColumnCount in interface SpreadSheet
        Returns:
        the number of columns
      • getRowCount

        public int getRowCount()
        Returns the number of rows currently stored.
        Specified by:
        getRowCount in interface SpreadSheet
        Returns:
        the number of rows
      • isNumeric

        public boolean isNumeric​(int columnIndex)
        Checks whether the given column is numeric or not. Does not accept missing values.
        Specified by:
        isNumeric in interface SpreadSheet
        Parameters:
        columnIndex - the index of the column to check
        Returns:
        true if purely numeric
        See Also:
        getContentTypes(int)
      • isNumeric

        public boolean isNumeric​(int columnIndex,
                                 boolean allowMissing)
        Checks whether the given column is numeric or not. Can accept missing values.
        Specified by:
        isNumeric in interface SpreadSheet
        Parameters:
        columnIndex - the index of the column to check
        Returns:
        true if purely numeric
        See Also:
        getContentTypes(int)
      • isContentType

        public boolean isContentType​(int columnIndex,
                                     Cell.ContentType type)
        Checks whether the given column is of the specific content type or not.
        Specified by:
        isContentType in interface SpreadSheet
        Parameters:
        columnIndex - the index of the column to check
        type - the content type to check
        Returns:
        true if column purely consists of this content type
        See Also:
        getContentType(int)
      • getContentType

        public Cell.ContentType getContentType​(int columnIndex)
        Returns the pure content type of the given column, if available.
        Specified by:
        getContentType in interface SpreadSheet
        Parameters:
        columnIndex - the index of the column to check
        Returns:
        the content type that this column consists of solely, null if mixed
      • getContentTypes

        public Collection<Cell.ContentType> getContentTypes​(int columnIndex)
        Returns the all content types of the given column, if available.
        Specified by:
        getContentTypes in interface SpreadSheet
        Parameters:
        columnIndex - the index of the column to check
        Returns:
        the content types that this column consists of
      • getCellValues

        public List<String> getCellValues​(String colKey)
        Returns the unique string values of the specified column. The returned list is sorted.
        Specified by:
        getCellValues in interface SpreadSheet
        Parameters:
        colKey - the column to retrieve the values for
        Returns:
        the sorted, list of unique values
      • getCellValues

        public List<String> getCellValues​(int colIndex)
        Returns the unique string values of the specified column. The returned list is sorted.
        Specified by:
        getCellValues in interface SpreadSheet
        Parameters:
        colIndex - the column to retrieve the values for
        Returns:
        the sorted, list of unique values
      • equalsHeader

        public String equalsHeader​(SpreadSheet other)
        Compares the header of this spreadsheet with the other one.
        Specified by:
        equalsHeader in interface SpreadSheet
        Parameters:
        other - the other spreadsheet to compare with
        Returns:
        null if equal, otherwise details what differs
      • toMatrix

        public Object[][] toMatrix()
        Returns the spreadsheet as matrix, with the header as the first row. Missing values are represented as null values.
        Specified by:
        toMatrix in interface SpreadSheet
        Returns:
        the row-wise matrix
      • removeMissing

        public boolean removeMissing()
        Removes all cells marked "missing".
        Specified by:
        removeMissing in interface SpreadSheet
        Returns:
        true if any cell was removed
      • setDateLenient

        public void setDateLenient​(boolean value)
        Sets whether parsing of dates is to be lenient or not.
        Specified by:
        setDateLenient in interface SpreadSheet
        Parameters:
        value - if true lenient parsing is used, otherwise not
        See Also:
        DateFormat.setLenient(boolean)
      • setDateTimeLenient

        public void setDateTimeLenient​(boolean value)
        Sets whether parsing of date/times is to be lenient or not.
        Specified by:
        setDateTimeLenient in interface SpreadSheet
        Parameters:
        value - if true lenient parsing is used, otherwise not
        See Also:
        DateFormat.setLenient(boolean)
      • setDateTimeMsecLenient

        public void setDateTimeMsecLenient​(boolean value)
        Sets whether parsing of date/time mses is to be lenient or not.
        Specified by:
        setDateTimeMsecLenient in interface SpreadSheet
        Parameters:
        value - if true lenient parsing is used, otherwise not
        See Also:
        DateFormat.setLenient(boolean)
      • setTimeLenient

        public void setTimeLenient​(boolean value)
        Sets whether parsing of times is to be lenient or not.
        Specified by:
        setTimeLenient in interface SpreadSheet
        Parameters:
        value - if true lenient parsing is used, otherwise not
      • isTimeLenient

        public boolean isTimeLenient()
        Returns whether the parsing of times is lenient or not.
        Specified by:
        isTimeLenient in interface SpreadSheet
        Returns:
        true if parsing is lenient
      • setTimeMsecLenient

        public void setTimeMsecLenient​(boolean value)
        Sets whether parsing of times/msec is to be lenient or not.
        Specified by:
        setTimeMsecLenient in interface SpreadSheet
        Parameters:
        value - if true lenient parsing is used, otherwise not
      • isTimeMsecLenient

        public boolean isTimeMsecLenient()
        Returns whether the parsing of times/msec is lenient or not.
        Specified by:
        isTimeMsecLenient in interface SpreadSheet
        Returns:
        true if parsing is lenient
      • setLocale

        public void setLocale​(Locale value)
        Sets the locale. Used in formatting/parsing numbers.
        Specified by:
        setLocale in interface LocaleSupporter
        Specified by:
        setLocale in interface SpreadSheet
        Parameters:
        value - the locale to use
      • calculate

        public void calculate()
        Triggers all formula cells to recalculate their values.
        Specified by:
        calculate in interface SpreadSheet
      • assign

        public void assign​(SpreadSheet sheet)
        Clears this spreadsheet and copies all the data from the given one.
        Specified by:
        assign in interface SpreadSheet
        Parameters:
        sheet - the data to copy
      • getDataRowClass

        public Class getDataRowClass()
        Returns the class used for rows.
        Specified by:
        getDataRowClass in interface SpreadSheet
        Returns:
        the class
      • newInstance

        public SpreadSheet newInstance()
        Returns a new instance.
        Specified by:
        newInstance in interface SpreadSheet
        Returns:
        the new instance, null if failed to create new instance
      • getHeader

        public Dataset getHeader()
        Returns the a spreadsheet with the same header and comments.
        Specified by:
        getHeader in interface Dataset
        Specified by:
        getHeader in interface SpreadSheet
        Returns:
        the spreadsheet
      • setName

        public void setName​(String value)
        Sets the name of the spreadsheet.
        Specified by:
        setName in interface SpreadSheet
        Parameters:
        value - the name
      • getName

        public String getName()
        Returns the name of the spreadsheet.
        Specified by:
        getName in interface SpreadSheet
        Returns:
        the name, can be null
      • hasName

        public boolean hasName()
        Returns whether the spreadsheet has a name.
        Specified by:
        hasName in interface SpreadSheet
        Returns:
        true if the spreadsheet is named
      • addComment

        public void addComment​(String comment)
        Adds the comment to the internal list of comments. If the comment contains newlines, then it gets automatically split into multiple lines and added one by one.
        Specified by:
        addComment in interface SpreadSheet
        Parameters:
        comment - the comment to add
      • indexOfColumn

        public int indexOfColumn​(String name)
        Returns the index of the column using the specified name.
        Specified by:
        indexOfColumn in interface Dataset
        Parameters:
        name - the name of the column to locate
        Returns:
        the index, -1 if failed to locate
      • removeClassAttributes

        public void removeClassAttributes()
        Removes all set class attributes.
        Specified by:
        removeClassAttributes in interface Dataset
      • isClassAttribute

        public boolean isClassAttribute​(String colKey)
        Returns whether the specified column is a class attribute.
        Specified by:
        isClassAttribute in interface Dataset
        Parameters:
        colKey - they key of the column to query
        Returns:
        true if column a class attribute
      • isClassAttributeByName

        public boolean isClassAttributeByName​(String name)
        Returns whether the specified column is a class attribute.
        Specified by:
        isClassAttributeByName in interface Dataset
        Parameters:
        name - they name of the column to query
        Returns:
        true if column a class attribute
      • isClassAttribute

        public boolean isClassAttribute​(int colIndex)
        Returns whether the specified column is a class attribute.
        Specified by:
        isClassAttribute in interface Dataset
        Parameters:
        colIndex - they index of the column to query
        Returns:
        true if column a class attribute
      • setClassAttribute

        public boolean setClassAttribute​(String colKey,
                                         boolean isClass)
        Sets the class attribute status for a column.
        Specified by:
        setClassAttribute in interface Dataset
        Parameters:
        colKey - the column to set the class attribute status for
        isClass - if true then the column will be flagged as class attribute, otherwise the flag will get removed
        Returns:
        true if successfully updated
      • setClassAttributeByName

        public boolean setClassAttributeByName​(String name,
                                               boolean isClass)
        Sets the class attribute status for a column.
        Specified by:
        setClassAttributeByName in interface Dataset
        Parameters:
        name - the name of the column to set the class attribute status for
        isClass - if true then the column will be flagged as class attribute, otherwise the flag will get removed
        Returns:
        true if successfully updated
      • setClassAttribute

        public boolean setClassAttribute​(int colIndex,
                                         boolean isClass)
        Sets the class attribute status for a column.
        Specified by:
        setClassAttribute in interface Dataset
        Parameters:
        colIndex - the column to set the class attribute status for
        isClass - if true then the column will be flagged as class attribute, otherwise the flag will get removed
        Returns:
        true if successfully updated
      • getClassAttributeKeys

        public String[] getClassAttributeKeys()
        Returns all the class attributes that are currently set.
        Specified by:
        getClassAttributeKeys in interface Dataset
        Returns:
        the column keys of class attributes (not ordered)
      • getClassAttributeNames

        public String[] getClassAttributeNames()
        Returns all the class attributes that are currently set.
        Specified by:
        getClassAttributeNames in interface Dataset
        Returns:
        the column names of class attributes (not ordered)
      • getClassAttributeIndices

        public int[] getClassAttributeIndices()
        Returns all the class attributes that are currently set.
        Specified by:
        getClassAttributeIndices in interface Dataset
        Returns:
        the indices of class attributes (sorted asc)
      • getInputs

        public SpreadSheet getInputs()
        Returns a spreadsheet containing only the input columns, not class columns.
        Specified by:
        getInputs in interface Dataset
        Returns:
        the input features, null if data conists only of class columns
      • getOutputs

        public SpreadSheet getOutputs()
        Returns a spreadsheet containing only output columns, i.e., the class columns.
        Specified by:
        getOutputs in interface Dataset
        Returns:
        the output features, null if data has no class columns
      • toView

        public SpreadSheetView toView​(int[] rows,
                                      int[] columns)
        Creates a view of the spreadsheet with the specified rows/columns.
        Specified by:
        toView in interface SpreadSheet
        Parameters:
        columns - the columns to use, null for all
        rows - the rows to use, null for all
        Returns:
        the view