Package weka.core.tokenizers
Class PreCleanedTokenizer
- java.lang.Object
-
- weka.core.tokenizers.Tokenizer
-
- weka.core.tokenizers.PreCleanedTokenizer
-
- All Implemented Interfaces:
Serializable
,Enumeration<String>
,weka.core.OptionHandler
,weka.core.RevisionHandler
public class PreCleanedTokenizer extends weka.core.tokenizers.Tokenizer
Allows the cleaning of tokens before actual tokenization.- Version:
- $Revision$
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static String
CLEANER
protected TokenCleaner
m_Cleaner
the cleaner to use.protected weka.core.tokenizers.Tokenizer
m_PostTokenizer
the post tokenizer to use.protected weka.core.tokenizers.Tokenizer
m_PreTokenizer
the pre tokenizer to use.static String
POST_TOKENIZER
static String
PRE_TOKENIZER
-
Constructor Summary
Constructors Constructor Description PreCleanedTokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
cleanerTipText()
Returns the tip text for this property.TokenCleaner
getCleaner()
Returns the cleaner to use for cleaning the tokens from the initial tokenization.protected TokenCleaner
getDefaultCleaner()
Returns the default cleaner.protected weka.core.tokenizers.Tokenizer
getDefaultPostTokenizer()
Returns the default (post) token tokenizer.protected weka.core.tokenizers.Tokenizer
getDefaultPreTokenizer()
Returns the default (pre) token tokenizer.String[]
getOptions()
Gets the current option settings for the OptionHandler.weka.core.tokenizers.Tokenizer
getPostTokenizer()
Returns the tokenizer to use for the final tokenization (after cleaning).weka.core.tokenizers.Tokenizer
getPreTokenizer()
Returns the tokenizer to use for the initial tokenization (before cleaning).String
getRevision()
Returns the revision string.String
globalInfo()
Returns a string describing the stemmerboolean
hasMoreElements()
Tests if this enumeration contains more elements.Enumeration
listOptions()
Returns an enumeration describing the available options.String
nextElement()
Returns the next element of this enumeration if this enumeration object has at least one more element to provide.String
postTokenizerTipText()
Returns the tip text for this property.String
preTokenizerTipText()
Returns the tip text for this property.void
setCleaner(TokenCleaner value)
Sets the cleaner to use for cleaning the tokens from the initial tokenization.void
setOptions(String[] options)
Sets the OptionHandler's options using the given list.void
setPostTokenizer(weka.core.tokenizers.Tokenizer value)
Sets the tokenizer to use for the final tokenization (after cleaning).void
setPreTokenizer(weka.core.tokenizers.Tokenizer value)
Sets the tokenizer to use for the initial tokenization (before cleaning).void
tokenize(String s)
Sets the string to tokenize.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface java.util.Enumeration
asIterator
-
-
-
-
Field Detail
-
PRE_TOKENIZER
public static final String PRE_TOKENIZER
- See Also:
- Constant Field Values
-
CLEANER
public static final String CLEANER
- See Also:
- Constant Field Values
-
POST_TOKENIZER
public static final String POST_TOKENIZER
- See Also:
- Constant Field Values
-
m_PreTokenizer
protected weka.core.tokenizers.Tokenizer m_PreTokenizer
the pre tokenizer to use.
-
m_Cleaner
protected TokenCleaner m_Cleaner
the cleaner to use.
-
m_PostTokenizer
protected weka.core.tokenizers.Tokenizer m_PostTokenizer
the post tokenizer to use.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the stemmer- Specified by:
globalInfo
in classweka.core.tokenizers.Tokenizer
- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.core.tokenizers.Tokenizer
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.core.tokenizers.Tokenizer
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current option settings for the OptionHandler.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.core.tokenizers.Tokenizer
- Returns:
- the list of current option settings as an array of strings
-
getDefaultPreTokenizer
protected weka.core.tokenizers.Tokenizer getDefaultPreTokenizer()
Returns the default (pre) token tokenizer.- Returns:
- the default
-
setPreTokenizer
public void setPreTokenizer(weka.core.tokenizers.Tokenizer value)
Sets the tokenizer to use for the initial tokenization (before cleaning).- Parameters:
value
- the tokenizer
-
getPreTokenizer
public weka.core.tokenizers.Tokenizer getPreTokenizer()
Returns the tokenizer to use for the initial tokenization (before cleaning).- Returns:
- the tokenizer
-
preTokenizerTipText
public String preTokenizerTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getDefaultCleaner
protected TokenCleaner getDefaultCleaner()
Returns the default cleaner.- Returns:
- the default
-
setCleaner
public void setCleaner(TokenCleaner value)
Sets the cleaner to use for cleaning the tokens from the initial tokenization.- Parameters:
value
- the cleaner
-
getCleaner
public TokenCleaner getCleaner()
Returns the cleaner to use for cleaning the tokens from the initial tokenization.- Returns:
- the cleaner
-
cleanerTipText
public String cleanerTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getDefaultPostTokenizer
protected weka.core.tokenizers.Tokenizer getDefaultPostTokenizer()
Returns the default (post) token tokenizer.- Returns:
- the default
-
setPostTokenizer
public void setPostTokenizer(weka.core.tokenizers.Tokenizer value)
Sets the tokenizer to use for the final tokenization (after cleaning).- Parameters:
value
- the tokenizer
-
getPostTokenizer
public weka.core.tokenizers.Tokenizer getPostTokenizer()
Returns the tokenizer to use for the final tokenization (after cleaning).- Returns:
- the tokenizer
-
postTokenizerTipText
public String postTokenizerTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
hasMoreElements
public boolean hasMoreElements()
Tests if this enumeration contains more elements.- Specified by:
hasMoreElements
in interfaceEnumeration<String>
- Specified by:
hasMoreElements
in classweka.core.tokenizers.Tokenizer
- Returns:
- true if and only if this enumeration object contains at least one more element to provide; false otherwise.
-
nextElement
public String nextElement()
Returns the next element of this enumeration if this enumeration object has at least one more element to provide.- Specified by:
nextElement
in interfaceEnumeration<String>
- Specified by:
nextElement
in classweka.core.tokenizers.Tokenizer
- Returns:
- the next element of this enumeration.
-
tokenize
public void tokenize(String s)
Sets the string to tokenize. Tokenization happens immediately.- Specified by:
tokenize
in classweka.core.tokenizers.Tokenizer
- Parameters:
s
- the string to tokenize
-
getRevision
public String getRevision()
Returns the revision string.- Returns:
- the revision
-
-