Package weka.core.tokenizers
Class TwitterNLPTokenizer
- java.lang.Object
-
- weka.core.tokenizers.Tokenizer
-
- weka.core.tokenizers.TwitterNLPTokenizer
-
- All Implemented Interfaces:
Serializable
,Enumeration<String>
,weka.core.OptionHandler
,weka.core.RevisionHandler
public class TwitterNLPTokenizer extends weka.core.tokenizers.Tokenizer
Tokenizer using TweetNLP's Twokenize. Taken from here- Version:
- $Revision$
- Author:
- Felipe Bravo
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static String
CLEANER
protected weka.core.tokenizers.cleaners.TokenCleaner
m_Cleaner
the cleaner to use.protected Iterator<String>
m_TokenIterator
the iterator for the tokens.protected boolean
m_UseLowerCase
whether to lower-case the tweet.static String
USE_LOWER_CASE
-
Constructor Summary
Constructors Constructor Description TwitterNLPTokenizer()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description String
cleanerTipText()
Returns the tip text for this property.weka.core.tokenizers.cleaners.TokenCleaner
getCleaner()
Returns the token cleaner to use.protected weka.core.tokenizers.cleaners.TokenCleaner
getDefaultCleaner()
Returns the default token cleaner.String[]
getOptions()
Gets the current option settings for the OptionHandler.String
getRevision()
Returns the revision string.boolean
getUseLowerCase()
Returns whether to use lower case.String
globalInfo()
Returns a string describing the tokenizer.boolean
hasMoreElements()
Tests if this enumeration contains more elements.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(String[] args)
Runs the tokenizer with the given options and strings to tokenize.String
nextElement()
Returns the next element of this enumeration if this enumeration object has at least one more element to provide.void
setCleaner(weka.core.tokenizers.cleaners.TokenCleaner value)
Sets the token cleaner to use.void
setOptions(String[] options)
Sets the OptionHandler's options using the given list.void
setUseLowerCase(boolean value)
Sets whether to use lower case.void
tokenize(String s)
Sets the string to tokenize.String
useLowerCaseTipText()
Returns the tip text for this property.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface java.util.Enumeration
asIterator
-
-
-
-
Field Detail
-
CLEANER
public static final String CLEANER
- See Also:
- Constant Field Values
-
USE_LOWER_CASE
public static final String USE_LOWER_CASE
- See Also:
- Constant Field Values
-
m_UseLowerCase
protected boolean m_UseLowerCase
whether to lower-case the tweet.
-
m_Cleaner
protected weka.core.tokenizers.cleaners.TokenCleaner m_Cleaner
the cleaner to use.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the tokenizer.- Specified by:
globalInfo
in classweka.core.tokenizers.Tokenizer
- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.core.tokenizers.Tokenizer
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.core.tokenizers.Tokenizer
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current option settings for the OptionHandler.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.core.tokenizers.Tokenizer
- Returns:
- the list of current option settings as an array of strings
-
setUseLowerCase
public void setUseLowerCase(boolean value)
Sets whether to use lower case.- Parameters:
value
- true if to use lower case
-
getUseLowerCase
public boolean getUseLowerCase()
Returns whether to use lower case.- Returns:
- true if to use lower case
-
useLowerCaseTipText
public String useLowerCaseTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getDefaultCleaner
protected weka.core.tokenizers.cleaners.TokenCleaner getDefaultCleaner()
Returns the default token cleaner.- Returns:
- the default
-
setCleaner
public void setCleaner(weka.core.tokenizers.cleaners.TokenCleaner value)
Sets the token cleaner to use.- Parameters:
value
- the cleaner
-
getCleaner
public weka.core.tokenizers.cleaners.TokenCleaner getCleaner()
Returns the token cleaner to use.- Returns:
- the cleaner
-
cleanerTipText
public String cleanerTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
hasMoreElements
public boolean hasMoreElements()
Tests if this enumeration contains more elements.- Specified by:
hasMoreElements
in interfaceEnumeration<String>
- Specified by:
hasMoreElements
in classweka.core.tokenizers.Tokenizer
- Returns:
- true if and only if this enumeration object contains at least one more element to provide; false otherwise.
-
nextElement
public String nextElement()
Returns the next element of this enumeration if this enumeration object has at least one more element to provide.- Specified by:
nextElement
in interfaceEnumeration<String>
- Specified by:
nextElement
in classweka.core.tokenizers.Tokenizer
- Returns:
- the next element of this enumeration.
-
tokenize
public void tokenize(String s)
Sets the string to tokenize. Tokenization happens immediately.- Specified by:
tokenize
in classweka.core.tokenizers.Tokenizer
- Parameters:
s
- the string to tokenize
-
getRevision
public String getRevision()
Returns the revision string.- Returns:
- the revision
-
main
public static void main(String[] args)
Runs the tokenizer with the given options and strings to tokenize. The tokens are printed to stdout.- Parameters:
args
- the commandline options and strings to tokenize
-
-