Package weka.core.tokenizers.cleaners
Class RegExp
- java.lang.Object
-
- weka.core.tokenizers.cleaners.AbstractTokenCleaner
-
- weka.core.tokenizers.cleaners.RegExp
-
- All Implemented Interfaces:
Serializable
,weka.core.OptionHandler
,weka.core.tokenizers.cleaners.TokenCleaner
public class RegExp extends weka.core.tokenizers.cleaners.AbstractTokenCleaner
Cleans tokens based on regular expressions, i.e., if token matches regexp it gets replaced with the specified expression.- Version:
- $Revision$
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description RegExp()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
clean(String token)
Determines whether a token is clean or not.String
findTipText()
Returns the tip text for this property.protected String
getDefaultFind()
Returns the default regular expression for finding tokens to clean.protected String
getDefaultReplace()
Returns the default expression for replacing matching tokens with.String
getFind()
Returns the regular expression to use for finding tokens to clean.String[]
getOptions()
Gets the current option settings for the OptionHandler.String
getReplace()
Returns the expression to use for replacing matching tokens with.String
globalInfo()
Returns a string describing the cleaner.Enumeration
listOptions()
Returns an enumeration describing the available options.String
replaceTipText()
Returns the tip text for this property.protected void
reset()
Resets the cleaner.void
setFind(String value)
Sets the regular expression to use for finding tokens to clean.void
setOptions(String[] options)
Sets the OptionHandler's options using the given list.void
setReplace(String value)
Sets the expression to use for replacing matching tokens with.
-
-
-
Field Detail
-
FIND
public static final String FIND
- See Also:
- Constant Field Values
-
REPLACE
public static final String REPLACE
- See Also:
- Constant Field Values
-
m_Find
protected String m_Find
the regular expression to use.
-
m_Replace
protected String m_Replace
the replacement to use.
-
m_Pattern
protected transient Pattern m_Pattern
the compiled pattern.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the cleaner.- Specified by:
globalInfo
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current option settings for the OptionHandler.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
- Returns:
- the list of current option settings as an array of strings
-
reset
protected void reset()
Resets the cleaner.- Overrides:
reset
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
-
getDefaultFind
protected String getDefaultFind()
Returns the default regular expression for finding tokens to clean.- Returns:
- the default
-
setFind
public void setFind(String value)
Sets the regular expression to use for finding tokens to clean.- Parameters:
value
- the regexp
-
getFind
public String getFind()
Returns the regular expression to use for finding tokens to clean.- Returns:
- the regexp
-
findTipText
public String findTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getDefaultReplace
protected String getDefaultReplace()
Returns the default expression for replacing matching tokens with.- Returns:
- the default
-
setReplace
public void setReplace(String value)
Sets the expression to use for replacing matching tokens with.- Parameters:
value
- the expression
-
getReplace
public String getReplace()
Returns the expression to use for replacing matching tokens with.- Returns:
- the expression
-
replaceTipText
public String replaceTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
clean
public String clean(String token)
Determines whether a token is clean or not.- Specified by:
clean
in interfaceweka.core.tokenizers.cleaners.TokenCleaner
- Specified by:
clean
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
- Parameters:
token
- the token to check- Returns:
- the clean token or null to ignore
-
-