Package weka.core

Class InstanceGrouping

  • All Implemented Interfaces:
    adams.core.logging.LoggingSupporter, adams.core.SizeOfHandler, Serializable

    public class InstanceGrouping
    extends adams.core.logging.LoggingObject
    Groups rows in a dataset using a regular expression on a nominal or string attribute.
    Author:
    FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected weka.core.Instances m_Data
      the original data.
      protected String m_Group
      the replacement string, using the groups from the regexp.
      protected Map<String,​gnu.trove.list.TIntList> m_Groups
      the groups.
      protected WekaAttributeIndex m_Index
      the attribute index.
      protected adams.core.base.BaseRegExp m_RegExp
      the regular expression.
      • Fields inherited from class adams.core.logging.LoggingObject

        m_Logger, m_LoggingIsEnabled, m_LoggingLevel
    • Constructor Summary

      Constructors 
      Constructor Description
      InstanceGrouping​(weka.core.Instances data, WekaAttributeIndex index, adams.core.base.BaseRegExp regExp, String group)
      Initializes the object.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      weka.core.Instances collapse​(weka.core.Instances data)
      Collapses the data into a fake dataset with only the the group and the class attribute.
      protected weka.core.Instances collapsedHeader()
      Creates the header for the collapsed data.
      weka.core.Instances expand​(weka.core.Instances data, boolean useView)
      Expands the fake data into the original dataset space.
      gnu.trove.list.TIntList expand​(weka.core.Instances data, gnu.trove.list.TIntList subset)
      Expands the fake data into the original dataset space.
      protected void expandCheck​(weka.core.Instances data)
      Ensures that the data to expand is in the right format.
      gnu.trove.list.TIntList get​(String group)
      Returns the group.
      weka.core.Instances getData()
      Returns the underlying data.
      String getGroup()
      The group expression, i.e., replacement string (eg '$2').
      adams.core.base.BaseRegExp getRegExp()
      Returns the regular expression in use (eg '(.*)-([0-9]+)-(.*)').
      Set<String> groups()
      Returns the groups.
      protected void initialize()
      Initializes the grouping.
      protected void process()
      Performs the grouping.
      int size()
      Returns the number of groups.
      String toString()
      Returns the groups and their indices.
      • Methods inherited from class adams.core.logging.LoggingObject

        configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
    • Field Detail

      • m_Data

        protected weka.core.Instances m_Data
        the original data.
      • m_RegExp

        protected adams.core.base.BaseRegExp m_RegExp
        the regular expression.
      • m_Group

        protected String m_Group
        the replacement string, using the groups from the regexp.
      • m_Groups

        protected Map<String,​gnu.trove.list.TIntList> m_Groups
        the groups.
    • Constructor Detail

      • InstanceGrouping

        public InstanceGrouping​(weka.core.Instances data,
                                WekaAttributeIndex index,
                                adams.core.base.BaseRegExp regExp,
                                String group)
        Initializes the object.
        Parameters:
        data - the data to group
        index - the index
        regExp - the regular expression (eg '(.*)-([0-9]+)-(.*)')
        group - the replacement string, using the groups from the regexp (eg '$2')
    • Method Detail

      • initialize

        protected void initialize()
        Initializes the grouping.
      • process

        protected void process()
        Performs the grouping.
      • collapsedHeader

        protected weka.core.Instances collapsedHeader()
        Creates the header for the collapsed data.
        Returns:
        the header
      • collapse

        public weka.core.Instances collapse​(weka.core.Instances data)
        Collapses the data into a fake dataset with only the the group and the class attribute.
        Parameters:
        data - the data to collapse
        Returns:
        the collapsed dataset
      • expandCheck

        protected void expandCheck​(weka.core.Instances data)
        Ensures that the data to expand is in the right format.
        Parameters:
        data - the data to check
        Throws:
        IllegalArgumentException - if checks fail
      • expand

        public gnu.trove.list.TIntList expand​(weka.core.Instances data,
                                              gnu.trove.list.TIntList subset)
        Expands the fake data into the original dataset space.
        Parameters:
        data - the data to expand
        Returns:
        the expanded dataset
      • expand

        public weka.core.Instances expand​(weka.core.Instances data,
                                          boolean useView)
        Expands the fake data into the original dataset space.
        Parameters:
        data - the data to expand
        useView - whether to use a view
        Returns:
        the expanded dataset
      • getData

        public weka.core.Instances getData()
        Returns the underlying data.
        Returns:
        the data
      • getRegExp

        public adams.core.base.BaseRegExp getRegExp()
        Returns the regular expression in use (eg '(.*)-([0-9]+)-(.*)').
        Returns:
        the regexp
      • getGroup

        public String getGroup()
        The group expression, i.e., replacement string (eg '$2').
        Returns:
        the group
      • size

        public int size()
        Returns the number of groups.
        Returns:
        the number of groups
      • groups

        public Set<String> groups()
        Returns the groups.
        Returns:
        the groups
      • get

        public gnu.trove.list.TIntList get​(String group)
        Returns the group.
        Parameters:
        group - the group to return
        Returns:
        the indices in the original dataset that form this group
      • toString

        public String toString()
        Returns the groups and their indices.
        Overrides:
        toString in class Object
        Returns:
        the generated string