Package weka.core.tokenizers.cleaners
Class WordCluster
- java.lang.Object
-
- weka.core.tokenizers.cleaners.AbstractTokenCleaner
-
- weka.core.tokenizers.cleaners.WordCluster
-
- All Implemented Interfaces:
Serializable
,weka.core.OptionHandler
,weka.core.tokenizers.cleaners.TokenCleaner
public class WordCluster extends weka.core.tokenizers.cleaners.AbstractTokenCleaner
Replaces words with clusters.- Version:
- $Revision$
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected Map<String,String>
m_Clusters
the clusters (word -> cluster).protected File
m_Model
the model to use.static String
MODEL
static String
UNKNOWN_WORD
in case the word has no cluster mapping.
-
Constructor Summary
Constructors Constructor Description WordCluster()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
clean(String token)
Determines whether a token is clean or not.protected File
getDefaultModel()
Returns the default model file.File
getModel()
Returns the model file to load and use.String[]
getOptions()
Gets the current option settings for the OptionHandler.String
globalInfo()
Returns a string describing the cleaner.Enumeration
listOptions()
Returns an enumeration describing the available options.String
modelTipText()
Returns the tip text for this property.protected void
reset()
Resets the scheme.void
setModel(File value)
Sets the model file to load and use.void
setOptions(String[] options)
Sets the OptionHandler's options using the given list.
-
-
-
Field Detail
-
MODEL
public static final String MODEL
- See Also:
- Constant Field Values
-
UNKNOWN_WORD
public static final String UNKNOWN_WORD
in case the word has no cluster mapping.- See Also:
- Constant Field Values
-
m_Model
protected File m_Model
the model to use.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the cleaner.- Specified by:
globalInfo
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current option settings for the OptionHandler.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
- Returns:
- the list of current option settings as an array of strings
-
reset
protected void reset()
Resets the scheme.- Overrides:
reset
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
-
getDefaultModel
protected File getDefaultModel()
Returns the default model file.- Returns:
- the default
-
setModel
public void setModel(File value)
Sets the model file to load and use.- Parameters:
value
- the model
-
getModel
public File getModel()
Returns the model file to load and use.- Returns:
- the model
-
modelTipText
public String modelTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
clean
public String clean(String token)
Determines whether a token is clean or not.- Specified by:
clean
in interfaceweka.core.tokenizers.cleaners.TokenCleaner
- Specified by:
clean
in classweka.core.tokenizers.cleaners.AbstractTokenCleaner
- Parameters:
token
- the token to check- Returns:
- the clean token or null to ignore
-
-