Class SpellChecker
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.SimpleFilter
-
- weka.filters.SimpleStreamFilter
-
- weka.filters.unsupervised.attribute.SpellChecker
-
- All Implemented Interfaces:
Serializable
,weka.core.CapabilitiesHandler
,weka.core.CapabilitiesIgnorer
,weka.core.CommandlineRunnable
,weka.core.OptionHandler
,weka.core.RevisionHandler
,weka.filters.StreamableFilter
,weka.filters.UnsupervisedFilter
public class SpellChecker extends weka.filters.SimpleStreamFilter implements weka.filters.UnsupervisedFilter
A simple filter that merges misspelled labels into a single correct one.
Valid options are:
-D Turns on output of debugging information.
-C <col> The index of the attribute to process. (default: last).
-incorrect <blank separated labels> The incorrectly spelled labels. (default: none).
-correct <label> The correct spelling for the labels. (default: correct).
- Version:
- $Revision$
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected weka.core.SingleIndex
m_AttributeIndex
the index of the attribute to work on.protected String
m_Correct
the correct spelling for the labels.protected String[]
m_Incorrect
the (misspelled) labels of the attribute to replace.protected HashSet<String>
m_IncorrectCache
the hashset with the incorret labels (for faster access).
-
Constructor Summary
Constructors Constructor Description SpellChecker()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description String
attributeIndexTipText()
Returns the tip text for this property.String
correctTipText()
Returns the tip text for this property.protected weka.core.Instances
determineOutputFormat(weka.core.Instances inputFormat)
Determines the output format based on the input format and returns this.String
getAttributeIndex()
Returns the 1-based index of the attribute to process.weka.core.Capabilities
getCapabilities()
Returns the Capabilities of this filter.String
getCorrect()
Returns the correct label.String
getIncorrect()
Returns the incorrect labels, blank-separated list.String[]
getOptions()
Gets the current settings of the filter.String
getRevision()
Returns the revision string.String
globalInfo()
Returns a string describing this classifier.String
incorrectTipText()
Returns the tip text for this property.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(String[] args)
Main method for testing this class.protected weka.core.Instance
process(weka.core.Instance instance)
processes the given instance (may change the provided instance) and returns the modified version.protected void
reset()
resets the filter.void
setAttributeIndex(String value)
Sets the attribute index (1-based) of the attribute to process.void
setCorrect(String value)
Sets the correct label.void
setIncorrect(String value)
Sets the incorrect labels, blank-separated list.void
setOptions(String[] options)
Parses a list of options for this object.-
Methods inherited from class weka.filters.SimpleStreamFilter
batchFinished, hasImmediateOutputFormat, input, preprocess, process
-
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
-
-
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing this classifier.- Specified by:
globalInfo
in classweka.filters.SimpleFilter
- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
reset
protected void reset()
resets the filter.- Overrides:
reset
in classweka.filters.SimpleFilter
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.filters.Filter
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Parses a list of options for this object. Also resets the state of the filter (this reset doesn't affect the options).
-
getOptions
public String[] getOptions()
Gets the current settings of the filter.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.filters.Filter
- Returns:
- an array of strings suitable for passing to setOptions
-
setAttributeIndex
public void setAttributeIndex(String value)
Sets the attribute index (1-based) of the attribute to process.- Parameters:
value
- the index (1-based)
-
getAttributeIndex
public String getAttributeIndex()
Returns the 1-based index of the attribute to process.- Returns:
- the index (1-based)
-
attributeIndexTipText
public String attributeIndexTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setIncorrect
public void setIncorrect(String value) throws Exception
Sets the incorrect labels, blank-separated list.- Parameters:
value
- the labels- Throws:
Exception
-
getIncorrect
public String getIncorrect()
Returns the incorrect labels, blank-separated list.- Returns:
- the labels
-
incorrectTipText
public String incorrectTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setCorrect
public void setCorrect(String value)
Sets the correct label.- Parameters:
value
- the label
-
getCorrect
public String getCorrect()
Returns the correct label.- Returns:
- the label
-
correctTipText
public String correctTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getCapabilities
public weka.core.Capabilities getCapabilities()
Returns the Capabilities of this filter. Derived filters have to override this method to enable capabilities.- Specified by:
getCapabilities
in interfaceweka.core.CapabilitiesHandler
- Overrides:
getCapabilities
in classweka.filters.Filter
- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
determineOutputFormat
protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat) throws Exception
Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., hasImmediateOutputFormat() returns false, then this method will called from batchFinished() after the call of preprocess(Instances), in which, e.g., statistics for the actual processing step can be gathered.- Specified by:
determineOutputFormat
in classweka.filters.SimpleStreamFilter
- Parameters:
inputFormat
- the input format to base the output format on- Returns:
- the output format
- Throws:
Exception
- in case the determination goes wrong
-
process
protected weka.core.Instance process(weka.core.Instance instance) throws Exception
processes the given instance (may change the provided instance) and returns the modified version.- Specified by:
process
in classweka.filters.SimpleStreamFilter
- Parameters:
instance
- the instance to process- Returns:
- the modified data
- Throws:
Exception
- in case the processing goes wrong
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceweka.core.RevisionHandler
- Overrides:
getRevision
in classweka.filters.Filter
- Returns:
- the revision
-
main
public static void main(String[] args)
Main method for testing this class.- Parameters:
args
- should contain arguments to the filter: use -h for help
-
-