Package weka.core.tokenizers.cleaners
Class RemoveNonWordCharTokens
- java.lang.Object
-
- weka.core.tokenizers.cleaners.AbstractTokenCleaner
-
- weka.core.tokenizers.cleaners.RemoveNonWordCharTokens
-
- All Implemented Interfaces:
Serializable
,weka.core.OptionHandler
,TokenCleaner
public class RemoveNonWordCharTokens extends AbstractTokenCleaner
Removes tokens that contain non-word characters. Matching sense can be inverted, i.e., only tokens with non-word characters get returned.- Version:
- $Revision$
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
PATTERN
, Serialized Form
-
-
Constructor Summary
Constructors Constructor Description RemoveNonWordCharTokens()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
clean(String token)
Determines whether a token is clean or not.boolean
getInvert()
Returns whether to invert the matching sense, ie keep only the emoticons rather than removing them.String[]
getOptions()
Gets the current option settings for the OptionHandler.String
globalInfo()
Returns a string describing the cleaner.String
invertTipText()
Returns the tip text for this property.Enumeration
listOptions()
Returns an enumeration describing the available options.protected void
reset()
Resets the cleaner.void
setInvert(boolean value)
Sets whether to invert the matching sense, ie keep only the emoticons rather than removing them.void
setOptions(String[] options)
Sets the OptionHandler's options using the given list.
-
-
-
Field Detail
-
PATTERN
public static final String PATTERN
the pattern to use.- See Also:
- Constant Field Values
-
INVERT
public static final String INVERT
- See Also:
- Constant Field Values
-
m_Invert
protected boolean m_Invert
whether to invert the matching sense.
-
m_Pattern
protected transient Pattern m_Pattern
the pattern in use.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the cleaner.- Specified by:
globalInfo
in classAbstractTokenCleaner
- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classAbstractTokenCleaner
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classAbstractTokenCleaner
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current option settings for the OptionHandler.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classAbstractTokenCleaner
- Returns:
- the list of current option settings as an array of strings
-
setInvert
public void setInvert(boolean value)
Sets whether to invert the matching sense, ie keep only the emoticons rather than removing them.- Parameters:
value
- true if to invert
-
getInvert
public boolean getInvert()
Returns whether to invert the matching sense, ie keep only the emoticons rather than removing them.- Returns:
- true if to invert
-
invertTipText
public String invertTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
reset
protected void reset()
Resets the cleaner.- Overrides:
reset
in classAbstractTokenCleaner
-
clean
public String clean(String token)
Determines whether a token is clean or not.- Specified by:
clean
in interfaceTokenCleaner
- Specified by:
clean
in classAbstractTokenCleaner
- Parameters:
token
- the token to check- Returns:
- the clean token or null to ignore
-
-