Class DataSet

  • All Implemented Interfaces:
    Iterable<DataObject>

    public class DataSet
    extends Object
    implements Iterable<DataObject>
    A set of DataObjects. Internally it uses an ArrayList<DataObject> so it is recommended to use with care. For speedy operations after the data is collected use getFeaturesAsArray(). When access speed is not a factor the iterator can be used to go through the data with a foreach loop.
    Author:
    Fernando Sanchez Villaamil
    See Also:
    getFeaturesAsArray()
    • Constructor Detail

      • DataSet

        public DataSet​(int nrOfDimensions)
        Creates an empty set.
        Parameters:
        nrOfDimensions - The dimension the DataObjects should have.
        Throws:
        praktikum.framework.data.InconsistentDimensionException
        See Also:
        DataObject
      • DataSet

        public DataSet​(DataObject newData)
        Creates a Set with only the given object. The dimension for the DataSet is given by the given DataObject.
        Parameters:
        newData - The first DataObject added to the set.
        Throws:
        praktikum.framework.data.InconsistentDimensionException
        See Also:
        DataObject
    • Method Detail

      • addObject

        public void addObject​(DataObject newData)
        Adds a DataObject to the set.
        Parameters:
        newData - The DataObject to be added.
        Throws:
        praktikum.framework.data.InconsistentDimensionException - This Exception is thrown when the dimension of the object to be added does not fit the rest.
        praktikum.framework.data.InconsistentDimensionException
        See Also:
        DataObject
      • getObject

        public DataObject getObject​(int index)
        Returns the DataObject at the given position. Use with care. For speedy operations after the data is collected use getFeaturesAsArray().
        Parameters:
        index - The index position that is to be returned.
        Returns:
        The DataObeject at the given position.
        See Also:
        DataObject
      • size

        public int size()
        Returns the size of the set.
        Returns:
        The size of the set.
      • getNrOfDimensions

        public int getNrOfDimensions()
        Return the dimension of the objects in the DataSet. The DataSet makes sure all objects have the same dimension.
        Returns:
        The dimension of the DataObjects in the DataSet.
        See Also:
        DataObject
      • getNrOfClasses

        public int getNrOfClasses()
        Counts the number of classes that are present in the data set. !!! It does not check whether all classes are contained !!!
        Returns:
        the number of distinct class labels
      • getFeaturesAsArray

        public double[][] getFeaturesAsArray()
        Returns an array with all the features of all the objects in the set. Be aware that it does not copy the features so that any changes to the values of the features in this array will result in changes in the initial DataObjects.
        Returns:
        A double[][] with the features. The first dimension is the feature number, the second the values in the feature.
        See Also:
        DataObject
      • getDataObjectArray

        public DataObject[] getDataObjectArray()
        Returns an array of all the DataObjects in the set. Use for speedy access when the labels and/or the ids are also needed.
        Returns:
        An array with all the DataObjects in the set.
        See Also:
        DataObject
      • getDataSetsPerClass

        public DataSet[] getDataSetsPerClass()
                                      throws Exception
        Separates the objects in this data set according to their class label
        Returns:
        an array of DataSets, one for each class
        Throws:
        Exception
      • getVariances

        public double[] getVariances()
        Calculates the variance of this data set for each dimension
        Returns:
        double array containing the variance per dimension
      • iterator

        public Iterator<DataObject> iterator()
        An iterator for the set. This allows to use a foreach loop over the DataObjects. This is a nice way of going through the data, but when the access speed is relevant its use is not recommended.
        Specified by:
        iterator in interface Iterable<DataObject>
        Returns:
        An iterator for the set.
        See Also:
        DataObject
      • toString

        public String toString()
        Returns a String representation of all the DataObjects in the code as a list of the representation implemented for these.
        Overrides:
        toString in class Object
        Returns:
        A String representation of all the elements in the set.
        See Also:
        DataObject
      • manipulateIds

        public void manipulateIds()
        resets the ids, so that the set contains ids from 0 to noOfObjects-1
      • clear

        public void clear()