Class SpellChecker

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.RevisionHandler, weka.filters.StreamableFilter, weka.filters.UnsupervisedFilter

    public class SpellChecker
    extends weka.filters.SimpleStreamFilter
    implements weka.filters.UnsupervisedFilter
    A simple filter that merges misspelled labels into a single correct one.

    Valid options are:

     -D
      Turns on output of debugging information.
     -C <col>
      The index of the attribute to process.
      (default: last).
     -incorrect <blank separated labels>
      The incorrectly spelled labels.
      (default: none).
     -correct <label>
      The correct spelling for the labels.
      (default: correct).
    Version:
    $Revision$
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected weka.core.SingleIndex m_AttributeIndex
      the index of the attribute to work on.
      protected String m_Correct
      the correct spelling for the labels.
      protected String[] m_Incorrect
      the (misspelled) labels of the attribute to replace.
      protected HashSet<String> m_IncorrectCache
      the hashset with the incorret labels (for faster access).
      • Fields inherited from class weka.filters.Filter

        m_Debug, m_DoNotCheckCapabilities, m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
    • Constructor Summary

      Constructors 
      Constructor Description
      SpellChecker()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      String attributeIndexTipText()
      Returns the tip text for this property.
      String correctTipText()
      Returns the tip text for this property.
      protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
      Determines the output format based on the input format and returns this.
      String getAttributeIndex()
      Returns the 1-based index of the attribute to process.
      weka.core.Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      String getCorrect()
      Returns the correct label.
      String getIncorrect()
      Returns the incorrect labels, blank-separated list.
      String[] getOptions()
      Gets the current settings of the filter.
      String getRevision()
      Returns the revision string.
      String globalInfo()
      Returns a string describing this classifier.
      String incorrectTipText()
      Returns the tip text for this property.
      Enumeration listOptions()
      Returns an enumeration describing the available options.
      static void main​(String[] args)
      Main method for testing this class.
      protected weka.core.Instance process​(weka.core.Instance instance)
      processes the given instance (may change the provided instance) and returns the modified version.
      protected void reset()
      resets the filter.
      void setAttributeIndex​(String value)
      Sets the attribute index (1-based) of the attribute to process.
      void setCorrect​(String value)
      Sets the correct label.
      void setIncorrect​(String value)
      Sets the incorrect labels, blank-separated list.
      void setOptions​(String[] options)
      Parses a list of options for this object.
      • Methods inherited from class weka.filters.SimpleStreamFilter

        batchFinished, hasImmediateOutputFormat, input, preprocess, process
      • Methods inherited from class weka.filters.SimpleFilter

        setInputFormat
      • Methods inherited from class weka.filters.Filter

        batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
    • Field Detail

      • m_AttributeIndex

        protected weka.core.SingleIndex m_AttributeIndex
        the index of the attribute to work on.
      • m_Incorrect

        protected String[] m_Incorrect
        the (misspelled) labels of the attribute to replace.
      • m_Correct

        protected String m_Correct
        the correct spelling for the labels.
      • m_IncorrectCache

        protected HashSet<String> m_IncorrectCache
        the hashset with the incorret labels (for faster access).
    • Constructor Detail

      • SpellChecker

        public SpellChecker()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this classifier.
        Specified by:
        globalInfo in class weka.filters.SimpleFilter
        Returns:
        a description of the classifier suitable for displaying in the explorer/experimenter gui
      • reset

        protected void reset()
        resets the filter.
        Overrides:
        reset in class weka.filters.SimpleFilter
      • listOptions

        public Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses a list of options for this object. Also resets the state of the filter (this reset doesn't affect the options).
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
        See Also:
        reset()
      • getOptions

        public String[] getOptions()
        Gets the current settings of the filter.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        an array of strings suitable for passing to setOptions
      • setAttributeIndex

        public void setAttributeIndex​(String value)
        Sets the attribute index (1-based) of the attribute to process.
        Parameters:
        value - the index (1-based)
      • getAttributeIndex

        public String getAttributeIndex()
        Returns the 1-based index of the attribute to process.
        Returns:
        the index (1-based)
      • attributeIndexTipText

        public String attributeIndexTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setIncorrect

        public void setIncorrect​(String value)
                          throws Exception
        Sets the incorrect labels, blank-separated list.
        Parameters:
        value - the labels
        Throws:
        Exception
      • getIncorrect

        public String getIncorrect()
        Returns the incorrect labels, blank-separated list.
        Returns:
        the labels
      • incorrectTipText

        public String incorrectTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setCorrect

        public void setCorrect​(String value)
        Sets the correct label.
        Parameters:
        value - the label
      • getCorrect

        public String getCorrect()
        Returns the correct label.
        Returns:
        the label
      • correctTipText

        public String correctTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns the Capabilities of this filter. Derived filters have to override this method to enable capabilities.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
        Returns:
        the capabilities of this object
        See Also:
        Capabilities
      • determineOutputFormat

        protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
                                                     throws Exception
        Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., hasImmediateOutputFormat() returns false, then this method will called from batchFinished() after the call of preprocess(Instances), in which, e.g., statistics for the actual processing step can be gathered.
        Specified by:
        determineOutputFormat in class weka.filters.SimpleStreamFilter
        Parameters:
        inputFormat - the input format to base the output format on
        Returns:
        the output format
        Throws:
        Exception - in case the determination goes wrong
      • process

        protected weka.core.Instance process​(weka.core.Instance instance)
                                      throws Exception
        processes the given instance (may change the provided instance) and returns the modified version.
        Specified by:
        process in class weka.filters.SimpleStreamFilter
        Parameters:
        instance - the instance to process
        Returns:
        the modified data
        Throws:
        Exception - in case the processing goes wrong
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.filters.Filter
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        Main method for testing this class.
        Parameters:
        args - should contain arguments to the filter: use -h for help