Class MultiLevelSplitGenerator

  • All Implemented Interfaces:
    adams.core.Destroyable, adams.core.GlobalInfoSupporter, adams.core.logging.LoggingLevelHandler, adams.core.logging.LoggingSupporter, adams.core.option.OptionHandler, adams.core.Randomizable, adams.core.SizeOfHandler, adams.core.Stoppable, adams.core.StoppableWithFeedback, adams.data.splitgenerator.SplitGenerator<weka.core.Instances,​WekaTrainTestSetContainer>, InstancesViewSupporter, Serializable, Iterator<WekaTrainTestSetContainer>, SplitGenerator

    public class MultiLevelSplitGenerator
    extends AbstractSplitGenerator
    implements SplitGenerator, adams.core.StoppableWithFeedback
    Generates splits based on groups extracted via regular expressions. Each attribute index/regular expression/group represents a level. At each level, the data gets split into groups according to the level's regexp/group, making up train and test sets.
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • m_RegExps

        protected adams.core.base.BaseRegExp[] m_RegExps
        the regular expressions to apply to determine the grouping.
      • m_Groups

        protected adams.core.base.BaseString[] m_Groups
        the groups to generate.
      • m_Silent

        protected boolean m_Silent
        whether to suppress error output.
      • m_Stopped

        protected boolean m_Stopped
        whether the generation got stopped.
    • Constructor Detail

      • MultiLevelSplitGenerator

        public MultiLevelSplitGenerator()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the object.
        Specified by:
        globalInfo in interface adams.core.GlobalInfoSupporter
        Specified by:
        globalInfo in class adams.core.option.AbstractOptionHandler
        Returns:
        a description suitable for displaying in the gui
      • defineOptions

        public void defineOptions()
        Adds options to the internal list of options.
        Specified by:
        defineOptions in interface adams.core.option.OptionHandler
        Overrides:
        defineOptions in class AbstractSplitGenerator
      • setIndices

        public void setIndices​(WekaAttributeIndex[] value)
        Sets the attribute indices.
        Parameters:
        value - the indices
      • getIndices

        public WekaAttributeIndex[] getIndices()
        Returns the attribute indices.
        Returns:
        the indices
      • indicesTipText

        public String indicesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setRegExps

        public void setRegExps​(adams.core.base.BaseRegExp[] value)
        Sets the regular expressions to use for extracting the groups.
        Parameters:
        value - the expressions
      • getRegExps

        public adams.core.base.BaseRegExp[] getRegExps()
        Returns the regular expressions to use for extracting the groups.
        Returns:
        the expressions
      • regExpsTipText

        public String regExpsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setGroups

        public void setGroups​(adams.core.base.BaseString[] value)
        Sets the groups to generate.
        Parameters:
        value - the groups
      • getGroups

        public adams.core.base.BaseString[] getGroups()
        Returns the groups to generate.
        Returns:
        the groups
      • groupsTipText

        public String groupsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setSilent

        public void setSilent​(boolean value)
        Sets whether to suppress error messages.
        Parameters:
        value - true if to suppress
      • getSilent

        public boolean getSilent()
        Returns whether to suppress error messages.
        Returns:
        true if to suppress
      • silentTipText

        public String silentTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • canRandomize

        protected boolean canRandomize()
        Returns whether randomization is enabled.
        Specified by:
        canRandomize in class AbstractSplitGenerator
        Returns:
        true if to randomize
      • generateGroups

        protected List<weka.core.Instances> generateGroups​(weka.core.Instances data,
                                                           int index,
                                                           String regexp,
                                                           String group)
        Generates the groups from the data by applying the regexp/group.
        Parameters:
        data - the data to split into groups
        index - the attribute index
        regexp - the regexp to apply to the values of the attribute
        group - the group ID to generate
        Returns:
        the generated groups
      • subset

        protected weka.core.Instances subset​(List<weka.core.Instances> groups,
                                             int index,
                                             boolean invert)
        Generates the subset: either the specified index of the rest.
        Parameters:
        groups - the groups to use
        index - the current index
        invert - whether to invert
        Returns:
        the generated instances
      • generateSplits

        protected List<com.github.fracpete.javautils.struct.Struct2<weka.core.Instances,​weka.core.Instances>> generateSplits​(weka.core.Instances data,
                                                                                                                                   int index,
                                                                                                                                   String regexp,
                                                                                                                                   String group)
        Generates the train/test splits.
        Parameters:
        data - the data to generate the splits for
        index - the attribute index
        regexp - the regexp to apply to the values of the attribute
        group - the group ID to generate
        Returns:
        the generated splits
      • match

        protected List<com.github.fracpete.javautils.struct.Struct2<weka.core.Instances,​weka.core.Instances>> match​(List<com.github.fracpete.javautils.struct.Struct2<weka.core.Instances,​weka.core.Instances>> trainSplits,
                                                                                                                          List<com.github.fracpete.javautils.struct.Struct2<weka.core.Instances,​weka.core.Instances>> testSplits,
                                                                                                                          int index)
        Combines train and test splits as long as there are matches. Others get dropped.
        Parameters:
        trainSplits - the training splits
        testSplits - the test splits
        Returns:
        the combined splits
      • generateContainers

        protected void generateContainers()
        Generates the containers.
      • checkNext

        protected boolean checkNext()
        Returns true if the iteration has more elements. (In other words, returns true if next would return an element rather than throwing an exception.)
        Specified by:
        checkNext in class AbstractSplitGenerator
        Returns:
        true if the iterator has more elements.
      • stopExecution

        public void stopExecution()
        Stops the execution.
        Specified by:
        stopExecution in interface adams.core.Stoppable
      • isStopped

        public boolean isStopped()
        Whether the execution has been stopped.
        Specified by:
        isStopped in interface adams.core.StoppableWithFeedback
        Returns:
        true if stopped