weka.filters.unsupervised.attribute
Class SpellChecker

java.lang.Object
  extended by weka.filters.Filter
      extended by weka.filters.SimpleFilter
          extended by weka.filters.SimpleStreamFilter
              extended by weka.filters.unsupervised.attribute.SpellChecker
All Implemented Interfaces:
Serializable, weka.core.CapabilitiesHandler, weka.core.OptionHandler, weka.core.RevisionHandler, weka.filters.StreamableFilter, weka.filters.UnsupervisedFilter

public class SpellChecker
extends weka.filters.SimpleStreamFilter
implements weka.filters.UnsupervisedFilter

A simple filter that merges misspelled labels into a single correct one.

Valid options are:

 -D
  Turns on output of debugging information.
 -C <col>
  The index of the attribute to process.
  (default: last).
 -incorrect <blank separated labels>
  The incorrectly spelled labels.
  (default: none).
 -correct <label>
  The correct spelling for the labels.
  (default: correct).

Version:
$Revision: 4521 $
Author:
fracpete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Field Summary
protected  weka.core.SingleIndex m_AttributeIndex
          the index of the attribute to work on.
protected  String m_Correct
          the correct spelling for the labels.
protected  String[] m_Incorrect
          the (misspelled) labels of the attribute to replace.
protected  HashSet<String> m_IncorrectCache
          the hashset with the incorret labels (for faster access).
 
Fields inherited from class weka.filters.SimpleFilter
m_Debug
 
Fields inherited from class weka.filters.Filter
m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
 
Constructor Summary
SpellChecker()
           
 
Method Summary
 String attributeIndexTipText()
          Returns the tip text for this property.
 String correctTipText()
          Returns the tip text for this property.
protected  weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat)
          Determines the output format based on the input format and returns this.
 String getAttributeIndex()
          Returns the 1-based index of the attribute to process.
 weka.core.Capabilities getCapabilities()
          Returns the Capabilities of this filter.
 String getCorrect()
          Returns the correct label.
 String getIncorrect()
          Returns the incorrect labels, blank-separated list.
 String[] getOptions()
          Gets the current settings of the filter.
 String getRevision()
          Returns the revision string.
 String globalInfo()
          Returns a string describing this classifier.
 String incorrectTipText()
          Returns the tip text for this property.
 Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(String[] args)
          Main method for testing this class.
protected  weka.core.Instance process(weka.core.Instance instance)
          processes the given instance (may change the provided instance) and returns the modified version.
protected  void reset()
          resets the filter.
 void setAttributeIndex(String value)
          Sets the attribute index (1-based) of the attribute to process.
 void setCorrect(String value)
          Sets the correct label.
 void setIncorrect(String value)
          Sets the incorrect labels, blank-separated list.
 void setOptions(String[] options)
          Parses a list of options for this object.
 
Methods inherited from class weka.filters.SimpleStreamFilter
batchFinished, hasImmediateOutputFormat, input, preprocess, process
 
Methods inherited from class weka.filters.SimpleFilter
debugTipText, getDebug, setDebug, setInputFormat
 
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, filterFile, flushInput, getCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, push, resetQueue, runFilter, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_AttributeIndex

protected weka.core.SingleIndex m_AttributeIndex
the index of the attribute to work on.


m_Incorrect

protected String[] m_Incorrect
the (misspelled) labels of the attribute to replace.


m_Correct

protected String m_Correct
the correct spelling for the labels.


m_IncorrectCache

protected HashSet<String> m_IncorrectCache
the hashset with the incorret labels (for faster access).

Constructor Detail

SpellChecker

public SpellChecker()
Method Detail

globalInfo

public String globalInfo()
Returns a string describing this classifier.

Specified by:
globalInfo in class weka.filters.SimpleFilter
Returns:
a description of the classifier suitable for displaying in the explorer/experimenter gui

reset

protected void reset()
resets the filter.

Overrides:
reset in class weka.filters.SimpleFilter

listOptions

public Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface weka.core.OptionHandler
Overrides:
listOptions in class weka.filters.SimpleFilter
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(String[] options)
                throws Exception
Parses a list of options for this object. Also resets the state of the filter (this reset doesn't affect the options).

Specified by:
setOptions in interface weka.core.OptionHandler
Overrides:
setOptions in class weka.filters.SimpleFilter
Parameters:
options - the list of options as an array of strings
Throws:
Exception - if an option is not supported
See Also:
reset()

getOptions

public String[] getOptions()
Gets the current settings of the filter.

Specified by:
getOptions in interface weka.core.OptionHandler
Overrides:
getOptions in class weka.filters.SimpleFilter
Returns:
an array of strings suitable for passing to setOptions

setAttributeIndex

public void setAttributeIndex(String value)
Sets the attribute index (1-based) of the attribute to process.

Parameters:
value - the index (1-based)

getAttributeIndex

public String getAttributeIndex()
Returns the 1-based index of the attribute to process.

Returns:
the index (1-based)

attributeIndexTipText

public String attributeIndexTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setIncorrect

public void setIncorrect(String value)
                  throws Exception
Sets the incorrect labels, blank-separated list.

Parameters:
value - the labels
Throws:
Exception

getIncorrect

public String getIncorrect()
Returns the incorrect labels, blank-separated list.

Returns:
the labels

incorrectTipText

public String incorrectTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setCorrect

public void setCorrect(String value)
Sets the correct label.

Parameters:
value - the label

getCorrect

public String getCorrect()
Returns the correct label.

Returns:
the label

correctTipText

public String correctTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

getCapabilities

public weka.core.Capabilities getCapabilities()
Returns the Capabilities of this filter. Derived filters have to override this method to enable capabilities.

Specified by:
getCapabilities in interface weka.core.CapabilitiesHandler
Overrides:
getCapabilities in class weka.filters.Filter
Returns:
the capabilities of this object
See Also:
Capabilities

determineOutputFormat

protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat)
                                             throws Exception
Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., hasImmediateOutputFormat() returns false, then this method will called from batchFinished() after the call of preprocess(Instances), in which, e.g., statistics for the actual processing step can be gathered.

Specified by:
determineOutputFormat in class weka.filters.SimpleStreamFilter
Parameters:
inputFormat - the input format to base the output format on
Returns:
the output format
Throws:
Exception - in case the determination goes wrong

process

protected weka.core.Instance process(weka.core.Instance instance)
                              throws Exception
processes the given instance (may change the provided instance) and returns the modified version.

Specified by:
process in class weka.filters.SimpleStreamFilter
Parameters:
instance - the instance to process
Returns:
the modified data
Throws:
Exception - in case the processing goes wrong

getRevision

public String getRevision()
Returns the revision string.

Specified by:
getRevision in interface weka.core.RevisionHandler
Overrides:
getRevision in class weka.filters.Filter
Returns:
the revision

main

public static void main(String[] args)
Main method for testing this class.

Parameters:
args - should contain arguments to the filter: use -h for help


Copyright © 2012 University of Waikato, Hamilton, NZ. All Rights Reserved.