Class WekaTokenizer

  • All Implemented Interfaces:
    adams.core.Destroyable, adams.core.GlobalInfoSupporter, adams.core.logging.LoggingLevelHandler, adams.core.logging.LoggingSupporter, adams.core.option.OptionHandler, adams.core.SizeOfHandler, Serializable

    public class WekaTokenizer
    extends AbstractTokenizer
    Uses the specified Weka tokenizer.

    -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel)
        The logging level for outputting errors and debugging output.
        default: WARNING
     
    -tokenizer <weka.core.tokenizers.Tokenizer> (property: tokenizer)
        The tokenizer to use.
        default: weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"
     
    Version:
    $Revision: 10826 $
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected weka.core.tokenizers.Tokenizer m_Tokenizer
      the tokenizer to use.
      • Fields inherited from class adams.core.option.AbstractOptionHandler

        m_OptionManager
      • Fields inherited from class adams.core.logging.LoggingObject

        m_Logger, m_LoggingIsEnabled, m_LoggingLevel
    • Constructor Summary

      Constructors 
      Constructor Description
      WekaTokenizer()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void defineOptions()
      Adds options to the internal list of options.
      protected List<String> doTokenize​(String str)
      Performs the actual tokenization.
      weka.core.tokenizers.Tokenizer getTokenizer()
      Returns the tokenizer to use.
      String globalInfo()
      Returns a string describing the object.
      void setTokenizer​(weka.core.tokenizers.Tokenizer value)
      Sets the tokenizer to use.
      String tokenizerTipText()
      Returns the tip text for this property.
      • Methods inherited from class adams.core.option.AbstractOptionHandler

        cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, initialize, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
      • Methods inherited from class adams.core.logging.LoggingObject

        configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
      • Methods inherited from interface adams.core.logging.LoggingLevelHandler

        getLoggingLevel
    • Field Detail

      • m_Tokenizer

        protected weka.core.tokenizers.Tokenizer m_Tokenizer
        the tokenizer to use.
    • Constructor Detail

      • WekaTokenizer

        public WekaTokenizer()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the object.
        Specified by:
        globalInfo in interface adams.core.GlobalInfoSupporter
        Specified by:
        globalInfo in class adams.core.option.AbstractOptionHandler
        Returns:
        a description suitable for displaying in the gui
      • defineOptions

        public void defineOptions()
        Adds options to the internal list of options.
        Specified by:
        defineOptions in interface adams.core.option.OptionHandler
        Overrides:
        defineOptions in class adams.core.option.AbstractOptionHandler
      • setTokenizer

        public void setTokenizer​(weka.core.tokenizers.Tokenizer value)
        Sets the tokenizer to use.
        Parameters:
        value - the tokenizer
      • getTokenizer

        public weka.core.tokenizers.Tokenizer getTokenizer()
        Returns the tokenizer to use.
        Returns:
        the tokenizer
      • tokenizerTipText

        public String tokenizerTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • doTokenize

        protected List<String> doTokenize​(String str)
        Performs the actual tokenization.
        Specified by:
        doTokenize in class AbstractTokenizer
        Parameters:
        str - the string to tokenize
        Returns:
        the list of sentence words