Package adams.flow.transformer.splitter
Class StanfordPTBTokenizer
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.flow.transformer.splitter.AbstractDocumentToSentences
-
- adams.flow.transformer.splitter.StanfordPTBTokenizer
-
- All Implemented Interfaces:
adams.core.Destroyable
,adams.core.GlobalInfoSupporter
,adams.core.logging.LoggingLevelHandler
,adams.core.logging.LoggingSupporter
,adams.core.option.OptionHandler
,adams.core.SizeOfHandler
,Serializable
public class StanfordPTBTokenizer extends AbstractDocumentToSentences
Uses Stanford's PTBTokenizer.
For more details on the options see:
http://nlp.stanford.edu/software/tokenizer.shtml
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-splitter-options <java.lang.String> (property: splitterOptions) The splitter options to use. default: normalizeParentheses=false,normalizeOtherBrackets=false,invertible=true
- Version:
- $Revision: 11956 $
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected String
m_SplitterOptions
the options for the splitter.protected edu.stanford.nlp.process.TokenizerFactory
m_TokenizerFactory
the tokenizer factory to use.
-
Constructor Summary
Constructors Constructor Description StanfordPTBTokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
defineOptions()
Adds options to the internal list of options.protected List<String>
doSplit(String doc)
Performs the actual splitting.String
getSplitterOptions()
Returns the splitter options to use.protected edu.stanford.nlp.process.TokenizerFactory
getTokenizerFactory()
Returns the tokenizer factory to use.String
globalInfo()
Returns a string describing the object.protected void
reset()
Resets the scheme.void
setSplitterOptions(String value)
Sets the splitter options to use.String
splitterOptionsTipText()
Returns the tip text for this property.-
Methods inherited from class adams.flow.transformer.splitter.AbstractDocumentToSentences
check, split
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
-
-
-
Field Detail
-
m_SplitterOptions
protected String m_SplitterOptions
the options for the splitter.
-
m_TokenizerFactory
protected transient edu.stanford.nlp.process.TokenizerFactory m_TokenizerFactory
the tokenizer factory to use.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceadams.core.GlobalInfoSupporter
- Specified by:
globalInfo
in classadams.core.option.AbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceadams.core.option.OptionHandler
- Overrides:
defineOptions
in classadams.core.option.AbstractOptionHandler
-
reset
protected void reset()
Resets the scheme.- Overrides:
reset
in classadams.core.option.AbstractOptionHandler
-
setSplitterOptions
public void setSplitterOptions(String value)
Sets the splitter options to use.- Parameters:
value
- the options
-
getSplitterOptions
public String getSplitterOptions()
Returns the splitter options to use.- Returns:
- the options
-
splitterOptionsTipText
public String splitterOptionsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getTokenizerFactory
protected edu.stanford.nlp.process.TokenizerFactory getTokenizerFactory()
Returns the tokenizer factory to use.- Returns:
- the factory
-
doSplit
protected List<String> doSplit(String doc)
Performs the actual splitting.- Specified by:
doSplit
in classAbstractDocumentToSentences
- Parameters:
doc
- the document to split- Returns:
- the list of sentence strings
-
-