Package weka.core.tokenizers.cleaners
Class RemoveNonWordCharTokens
- java.lang.Object
-
- weka.core.tokenizers.cleaners.AbstractTokenCleaner
-
- weka.core.tokenizers.cleaners.RemoveNonWordCharTokens
-
- All Implemented Interfaces:
Serializable,weka.core.OptionHandler,TokenCleaner
public class RemoveNonWordCharTokens extends AbstractTokenCleaner
Removes tokens that contain non-word characters. Matching sense can be inverted, i.e., only tokens with non-word characters get returned.- Version:
- $Revision$
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
PATTERN, Serialized Form
-
-
Constructor Summary
Constructors Constructor Description RemoveNonWordCharTokens()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Stringclean(String token)Determines whether a token is clean or not.booleangetInvert()Returns whether to invert the matching sense, ie keep only the emoticons rather than removing them.String[]getOptions()Gets the current option settings for the OptionHandler.StringglobalInfo()Returns a string describing the cleaner.StringinvertTipText()Returns the tip text for this property.EnumerationlistOptions()Returns an enumeration describing the available options.protected voidreset()Resets the cleaner.voidsetInvert(boolean value)Sets whether to invert the matching sense, ie keep only the emoticons rather than removing them.voidsetOptions(String[] options)Sets the OptionHandler's options using the given list.
-
-
-
Field Detail
-
PATTERN
public static final String PATTERN
the pattern to use.- See Also:
- Constant Field Values
-
INVERT
public static final String INVERT
- See Also:
- Constant Field Values
-
m_Invert
protected boolean m_Invert
whether to invert the matching sense.
-
m_Pattern
protected transient Pattern m_Pattern
the pattern in use.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the cleaner.- Specified by:
globalInfoin classAbstractTokenCleaner- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptionsin interfaceweka.core.OptionHandler- Overrides:
listOptionsin classAbstractTokenCleaner- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).- Specified by:
setOptionsin interfaceweka.core.OptionHandler- Overrides:
setOptionsin classAbstractTokenCleaner- Parameters:
options- the list of options as an array of strings- Throws:
Exception- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current option settings for the OptionHandler.- Specified by:
getOptionsin interfaceweka.core.OptionHandler- Overrides:
getOptionsin classAbstractTokenCleaner- Returns:
- the list of current option settings as an array of strings
-
setInvert
public void setInvert(boolean value)
Sets whether to invert the matching sense, ie keep only the emoticons rather than removing them.- Parameters:
value- true if to invert
-
getInvert
public boolean getInvert()
Returns whether to invert the matching sense, ie keep only the emoticons rather than removing them.- Returns:
- true if to invert
-
invertTipText
public String invertTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
reset
protected void reset()
Resets the cleaner.- Overrides:
resetin classAbstractTokenCleaner
-
clean
public String clean(String token)
Determines whether a token is clean or not.- Specified by:
cleanin interfaceTokenCleaner- Specified by:
cleanin classAbstractTokenCleaner- Parameters:
token- the token to check- Returns:
- the clean token or null to ignore
-
-