Class StanfordPTBTokenizer

  • All Implemented Interfaces:
    adams.core.Destroyable, adams.core.GlobalInfoSupporter, adams.core.logging.LoggingLevelHandler, adams.core.logging.LoggingSupporter, adams.core.option.OptionHandler, adams.core.SizeOfHandler, Serializable

    public class StanfordPTBTokenizer
    extends AbstractDocumentToSentences
    Uses Stanford's PTBTokenizer.

    For more details on the options see:
    http://nlp.stanford.edu/software/tokenizer.shtml

    -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel)
        The logging level for outputting errors and debugging output.
        default: WARNING
     
    -splitter-options <java.lang.String> (property: splitterOptions)
        The splitter options to use.
        default: normalizeParentheses=false,normalizeOtherBrackets=false,invertible=true
     
    Version:
    $Revision: 11956 $
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected String m_SplitterOptions
      the options for the splitter.
      protected edu.stanford.nlp.process.TokenizerFactory m_TokenizerFactory
      the tokenizer factory to use.
      • Fields inherited from class adams.core.option.AbstractOptionHandler

        m_OptionManager
      • Fields inherited from class adams.core.logging.LoggingObject

        m_Logger, m_LoggingIsEnabled, m_LoggingLevel
    • Field Detail

      • m_SplitterOptions

        protected String m_SplitterOptions
        the options for the splitter.
      • m_TokenizerFactory

        protected transient edu.stanford.nlp.process.TokenizerFactory m_TokenizerFactory
        the tokenizer factory to use.
    • Constructor Detail

      • StanfordPTBTokenizer

        public StanfordPTBTokenizer()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the object.
        Specified by:
        globalInfo in interface adams.core.GlobalInfoSupporter
        Specified by:
        globalInfo in class adams.core.option.AbstractOptionHandler
        Returns:
        a description suitable for displaying in the gui
      • defineOptions

        public void defineOptions()
        Adds options to the internal list of options.
        Specified by:
        defineOptions in interface adams.core.option.OptionHandler
        Overrides:
        defineOptions in class adams.core.option.AbstractOptionHandler
      • reset

        protected void reset()
        Resets the scheme.
        Overrides:
        reset in class adams.core.option.AbstractOptionHandler
      • setSplitterOptions

        public void setSplitterOptions​(String value)
        Sets the splitter options to use.
        Parameters:
        value - the options
      • getSplitterOptions

        public String getSplitterOptions()
        Returns the splitter options to use.
        Returns:
        the options
      • splitterOptionsTipText

        public String splitterOptionsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getTokenizerFactory

        protected edu.stanford.nlp.process.TokenizerFactory getTokenizerFactory()
        Returns the tokenizer factory to use.
        Returns:
        the factory