public class CharacterNGramTokenizer extends Tokenizer
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
| Constructor and Description |
|---|
CharacterNGramTokenizer() |
| Modifier and Type | Method and Description |
|---|---|
int |
getNGramMaxSize()
Gets the max N of the NGram.
|
int |
getNGramMinSize()
Gets the min N of the NGram.
|
String[] |
getOptions()
Gets the current option settings for the OptionHandler.
|
String |
getRevision()
Returns the revision string.
|
String |
globalInfo()
Returns a string describing the tokenizer
|
boolean |
hasMoreElements()
returns true if there's more elements available
|
Enumeration<Option> |
listOptions()
Returns an enumeration of all the available options..
|
static void |
main(String[] args)
Runs the tokenizer with the given options and strings to tokenize.
|
String |
nextElement()
Returns N-grams and also (N-1)-grams and ....
|
String |
NGramMaxSizeTipText()
Returns the tip text for this property.
|
String |
NGramMinSizeTipText()
Returns the tip text for this property.
|
void |
setNGramMaxSize(int value)
Sets the max size of the Ngram.
|
void |
setNGramMinSize(int value)
Sets the min size of the Ngram.
|
void |
setOptions(String[] options)
Parses a given list of options.
|
void |
tokenize(String s)
Sets the string to tokenize.
|
runTokenizer, tokenizepublic String globalInfo()
globalInfo in class Tokenizerpublic Enumeration<Option> listOptions()
listOptions in interface OptionHandlerlistOptions in class Tokenizerpublic String[] getOptions()
getOptions in interface OptionHandlergetOptions in class Tokenizerpublic void setOptions(String[] options) throws Exception
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
setOptions in interface OptionHandlersetOptions in class Tokenizeroptions - the list of options as an array of stringsException - if an option is not supportedpublic int getNGramMaxSize()
public void setNGramMaxSize(int value)
value - the size of the NGram.public String NGramMaxSizeTipText()
public void setNGramMinSize(int value)
value - the size of the NGram.public int getNGramMinSize()
public String NGramMinSizeTipText()
public boolean hasMoreElements()
hasMoreElements in interface Enumeration<String>hasMoreElements in class Tokenizerpublic String nextElement()
nextElement in interface Enumeration<String>nextElement in class Tokenizerpublic void tokenize(String s)
public String getRevision()
public static void main(String[] args)
args - the commandline options and strings to tokenizeCopyright © 2016 University of Waikato, Hamilton, NZ. All Rights Reserved.