Class NominalToNumeric

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.RevisionHandler, weka.filters.StreamableFilter

    public class NominalToNumeric
    extends weka.filters.SimpleStreamFilter
    Converts a nominal attribute into a numeric one. Can either just use the internal representation of the labels as numeric value or parse the label itself (subset can be extracted via regexp).

    Valid options are:

     -index <value>
      The index of the attribute to convert; An index is a number starting with 1; apart from attribute names (case-sensitive), the following placeholders can be used as well: first, second, third, last_2, last_1, last; numeric indices can be enforced by preceding them with '#' (eg '#12'); attribute names can be surrounded by double quotes.
      (default: index=last, max=-1)
     -type <value>
      The type of conversion to perform.
      (default: INTERNAL_REPRESENTATION)
     -find <value>
      The regular expression to use for extracting the numeric part from the label; use .* to match label as a whole.
      (default: .*)
     -replace <value>
      The expression to use for assembling the numeric part; use $0 to use label as is.
      (default: $0)
     -output-debug-info
      If set, filter is run in debug mode and
      may output additional info to the console
     -do-not-check-capabilities
      If set, filter capabilities are not checked before filter is built
      (use with caution).
    Author:
    FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Constructor Detail

      • NominalToNumeric

        public NominalToNumeric()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this filter.
        Specified by:
        globalInfo in class weka.filters.SimpleFilter
        Returns:
        a description of the filter suitable for displaying in the explorer/experimenter gui
      • listOptions

        public Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
      • getOptions

        public String[] getOptions()
        Gets the current option settings for the OptionHandler.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        the list of current option settings as an array of strings
      • reset

        protected void reset()
        Resets the cleaner.
        Overrides:
        reset in class weka.filters.SimpleFilter
      • getDefaultIndex

        protected WekaAttributeIndex getDefaultIndex()
        Returns the default attribute index.
        Returns:
        the default
      • setIndex

        public void setIndex​(WekaAttributeIndex value)
        Sets the index of the attribute to convert.
        Parameters:
        value - the regexp
      • getIndex

        public WekaAttributeIndex getIndex()
        Returns the index of the attribute to convert.
        Returns:
        the index
      • indexTipText

        public String indexTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getDefaultType

        protected NominalToNumeric.ConversionType getDefaultType()
        Returns the default regular expression for finding tokens to clean.
        Returns:
        the default
      • typeTipText

        public String typeTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getDefaultFind

        protected BaseRegExp getDefaultFind()
        Returns the default regular expression for finding tokens to clean.
        Returns:
        the default
      • setFind

        public void setFind​(BaseRegExp value)
        Sets the regular expression to use for extracting the numeric part from the label.
        Parameters:
        value - the regexp
      • getFind

        public BaseRegExp getFind()
        Returns the regular expression to use for extracting the numeric part from the label.
        Returns:
        the regexp
      • findTipText

        public String findTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getDefaultReplace

        protected String getDefaultReplace()
        Returns the default expression for replacing matching tokens with.
        Returns:
        the default
      • setReplace

        public void setReplace​(String value)
        Sets the expression to use for assembling the numeric part.
        Parameters:
        value - the expression
      • getReplace

        public String getReplace()
        Returns the expression to use for assembling the numeric part.
        Returns:
        the expression
      • replaceTipText

        public String replaceTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
      • determineOutputFormat

        protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
                                                     throws Exception
        Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., hasImmediateOutputFormat() returns false, then this method will called from batchFinished() after the call of preprocess(Instances), in which, e.g., statistics for the actual processing step can be gathered.
        Specified by:
        determineOutputFormat in class weka.filters.SimpleStreamFilter
        Parameters:
        inputFormat - the input format to base the output format on
        Returns:
        the output format
        Throws:
        Exception - in case the determination goes wrong
      • process

        protected weka.core.Instance process​(weka.core.Instance instance)
                                      throws Exception
        processes the given instance (may change the provided instance) and returns the modified version.
        Specified by:
        process in class weka.filters.SimpleStreamFilter
        Parameters:
        instance - the instance to process
        Returns:
        the modified data
        Throws:
        Exception - in case the processing goes wrong
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.filters.Filter
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        Main method for testing this class.
        Parameters:
        args - should contain arguments to the filter: use -h for help