Package weka.core.tokenizers.cleaners
Class NormalizeDuplicateChars
- java.lang.Object
-
- weka.core.tokenizers.cleaners.AbstractTokenCleaner
-
- weka.core.tokenizers.cleaners.NormalizeDuplicateChars
-
- All Implemented Interfaces:
Serializable
,weka.core.OptionHandler
,TokenCleaner
public class NormalizeDuplicateChars extends AbstractTokenCleaner
Replaces all duplicate characters with a single one. Eg 'oooooh noooo!!!!' becomes 'oh no!'.- Version:
- $Revision$
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description NormalizeDuplicateChars()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
clean(String token)
Determines whether a token is clean or not.String
globalInfo()
Returns a string describing the cleaner.-
Methods inherited from class weka.core.tokenizers.cleaners.AbstractTokenCleaner
getOptions, listOptions, reset, setOptions
-
-
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the cleaner.- Specified by:
globalInfo
in classAbstractTokenCleaner
- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
clean
public String clean(String token)
Determines whether a token is clean or not.- Specified by:
clean
in interfaceTokenCleaner
- Specified by:
clean
in classAbstractTokenCleaner
- Parameters:
token
- the token to check- Returns:
- the clean token or null to ignore
-
-