Class AbstractDocumentToSentences

  • All Implemented Interfaces:
    adams.core.Destroyable, adams.core.GlobalInfoSupporter, adams.core.logging.LoggingLevelHandler, adams.core.logging.LoggingSupporter, adams.core.option.OptionHandler, adams.core.SizeOfHandler, Serializable
    Direct Known Subclasses:
    StanfordPTBTokenizer

    public abstract class AbstractDocumentToSentences
    extends adams.core.option.AbstractOptionHandler
    Ancestor for classes that split document strings into sentences.
    Version:
    $Revision: 10826 $
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      • Fields inherited from class adams.core.option.AbstractOptionHandler

        m_OptionManager
      • Fields inherited from class adams.core.logging.LoggingObject

        m_Logger, m_LoggingIsEnabled, m_LoggingLevel
    • Method Summary

      All Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      protected void check​(String doc)
      Checks the document.
      protected abstract List<String> doSplit​(String doc)
      Performs the actual splitting.
      List<String> split​(String doc)
      Splits the given document string into sentences.
      • Methods inherited from class adams.core.option.AbstractOptionHandler

        cleanUpOptions, defineOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, globalInfo, initialize, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
      • Methods inherited from class adams.core.logging.LoggingObject

        configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
      • Methods inherited from interface adams.core.logging.LoggingLevelHandler

        getLoggingLevel
    • Constructor Detail

      • AbstractDocumentToSentences

        public AbstractDocumentToSentences()
    • Method Detail

      • check

        protected void check​(String doc)
        Checks the document.

        Default implementation only checks whether a document string was provided.
        Parameters:
        doc - the document to check
      • doSplit

        protected abstract List<String> doSplit​(String doc)
        Performs the actual splitting.
        Parameters:
        doc - the document to split
        Returns:
        the list of sentence strings
      • split

        public List<String> split​(String doc)
        Splits the given document string into sentences.
        Parameters:
        doc - the document to process
        Returns:
        the generated sentences