Class RemoveWithLabels

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.RevisionHandler

    public class RemoveWithLabels
    extends weka.filters.SimpleBatchFilter
    Allows the user to remove nominal labels via a regular expression.

    Valid options are:

     -index <value>
      The index of the attribute to process; An index is a number starting with 1; apart from attribute names (case-sensitive), the following placeholders can be used as well: first, second, third, last_2, last_1, last; numeric indices can be enforced by preceding them with '#' (eg '#12'); attribute names can be surrounded by double quotes.
      (default: index=last, max=-1)
     -label-regexp <value>
      The regular expression for matching the labels to remove.
      (default: ^(label1|label2|label3)$)
     -invert
      If enabled, the matching sense is inverted, i.e., the matching labels are kept and all others removed.
     -update-header
      If enabled, the labels also get removed from the attribute definition.
     -output-debug-info
      If set, filter is run in debug mode and
      may output additional info to the console
     -do-not-check-capabilities
      If set, filter capabilities are not checked before filter is built
      (use with caution).
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
      Determines the output format based on the input format and returns this.
      weka.core.Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      protected WekaAttributeIndex getDefaultIndex()
      Returns the default attribute index.
      protected adams.core.base.BaseRegExp getDefaultLabelRegExp()
      Returns the default label regular expression.
      WekaAttributeIndex getIndex()
      Returns the index of the attribute to convert.
      boolean getInvert()
      Returns whether to invert the matching sense.
      adams.core.base.BaseRegExp getLabelRegExp()
      Returns the regular expression for matching the labels to remove.
      String[] getOptions()
      Gets the current settings of the filter.
      String getRevision()
      Returns the revision string.
      boolean getUpdateHeader()
      Returns whether to remove the labels also from the attribute definition.
      String globalInfo()
      Returns a string describing this filter.
      String indexTipText()
      Returns the tip text for this property.
      String invertTipText()
      Returns the tip text for this property.
      String labelRegExpTipText()
      Returns the tip text for this property.
      Enumeration listOptions()
      Returns an enumeration describing the available options.
      static void main​(String[] args)
      Main method for testing this class.
      protected weka.core.Instances process​(weka.core.Instances instances)
      Processes the given data (may change the provided dataset) and returns the modified version.
      void setIndex​(WekaAttributeIndex value)
      Sets the index of the attribute to convert.
      void setInvert​(boolean value)
      Sets whether to invert the matching sense.
      void setLabelRegExp​(adams.core.base.BaseRegExp value)
      Sets the regular expression for matching the labels to remove.
      void setOptions​(String[] options)
      Parses a list of options for this object.
      void setUpdateHeader​(boolean value)
      Sets whether to remove the labels also from the attribute definition.
      String updateHeaderTipText()
      Returns the tip text for this property.
      • Methods inherited from class weka.filters.SimpleBatchFilter

        allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input
      • Methods inherited from class weka.filters.SimpleFilter

        reset, setInputFormat
      • Methods inherited from class weka.filters.Filter

        batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
    • Field Detail

      • m_LabelRegExp

        protected adams.core.base.BaseRegExp m_LabelRegExp
        the regular expression for matching the labels to remove.
      • m_Invert

        protected boolean m_Invert
        whether to invert the matching.
      • m_UpdateHeader

        protected boolean m_UpdateHeader
        whether to update the header.
      • m_LabelMapping

        protected Map<Integer,​Integer> m_LabelMapping
        the label mapping (old -> new).
    • Constructor Detail

      • RemoveWithLabels

        public RemoveWithLabels()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this filter.
        Specified by:
        globalInfo in class weka.filters.SimpleFilter
        Returns:
        a description of the filter suitable for displaying in the explorer/experimenter gui
      • listOptions

        public Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses a list of options for this object.
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
      • getOptions

        public String[] getOptions()
        Gets the current settings of the filter.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        an array of strings suitable for passing to setOptions
      • getDefaultIndex

        protected WekaAttributeIndex getDefaultIndex()
        Returns the default attribute index.
        Returns:
        the default
      • setIndex

        public void setIndex​(WekaAttributeIndex value)
        Sets the index of the attribute to convert.
        Parameters:
        value - the regexp
      • getIndex

        public WekaAttributeIndex getIndex()
        Returns the index of the attribute to convert.
        Returns:
        the index
      • indexTipText

        public String indexTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getDefaultLabelRegExp

        protected adams.core.base.BaseRegExp getDefaultLabelRegExp()
        Returns the default label regular expression.
        Returns:
        the default
      • setLabelRegExp

        public void setLabelRegExp​(adams.core.base.BaseRegExp value)
        Sets the regular expression for matching the labels to remove.
        Parameters:
        value - the expression
      • getLabelRegExp

        public adams.core.base.BaseRegExp getLabelRegExp()
        Returns the regular expression for matching the labels to remove.
        Returns:
        the expression
      • labelRegExpTipText

        public String labelRegExpTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setInvert

        public void setInvert​(boolean value)
        Sets whether to invert the matching sense.
        Parameters:
        value - true if to invert
      • getInvert

        public boolean getInvert()
        Returns whether to invert the matching sense.
        Returns:
        true if to invert
      • invertTipText

        public String invertTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setUpdateHeader

        public void setUpdateHeader​(boolean value)
        Sets whether to remove the labels also from the attribute definition.
        Parameters:
        value - true if to update header
      • getUpdateHeader

        public boolean getUpdateHeader()
        Returns whether to remove the labels also from the attribute definition.
        Returns:
        true if to update header
      • updateHeaderTipText

        public String updateHeaderTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns the Capabilities of this filter.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
        Returns:
        the capabilities of this object
        See Also:
        Capabilities
      • determineOutputFormat

        protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
                                                     throws Exception
        Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().
        Specified by:
        determineOutputFormat in class weka.filters.SimpleFilter
        Parameters:
        inputFormat - the input format to base the output format on
        Returns:
        the output format
        Throws:
        Exception - in case the determination goes wrong
        See Also:
        SimpleBatchFilter.hasImmediateOutputFormat(), SimpleBatchFilter.batchFinished()
      • process

        protected weka.core.Instances process​(weka.core.Instances instances)
                                       throws Exception
        Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().
        Specified by:
        process in class weka.filters.SimpleFilter
        Parameters:
        instances - the data to process
        Returns:
        the modified data
        Throws:
        Exception - in case the processing goes wrong
        See Also:
        SimpleBatchFilter.batchFinished()
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.filters.Filter
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        Main method for testing this class.
        Parameters:
        args - should contain arguments to the filter: use -h for help