Class PreCleanedTokenizer

  • All Implemented Interfaces:
    Serializable, Enumeration<String>, weka.core.OptionHandler, weka.core.RevisionHandler

    public class PreCleanedTokenizer
    extends weka.core.tokenizers.Tokenizer
    Allows the cleaning of tokens before actual tokenization.
    Version:
    $Revision$
    Author:
    FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • m_PreTokenizer

        protected weka.core.tokenizers.Tokenizer m_PreTokenizer
        the pre tokenizer to use.
      • m_Cleaner

        protected TokenCleaner m_Cleaner
        the cleaner to use.
      • m_PostTokenizer

        protected weka.core.tokenizers.Tokenizer m_PostTokenizer
        the post tokenizer to use.
    • Constructor Detail

      • PreCleanedTokenizer

        public PreCleanedTokenizer()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the stemmer
        Specified by:
        globalInfo in class weka.core.tokenizers.Tokenizer
        Returns:
        a description suitable for displaying in the explorer/experimenter gui
      • listOptions

        public Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.core.tokenizers.Tokenizer
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.core.tokenizers.Tokenizer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
      • getOptions

        public String[] getOptions()
        Gets the current option settings for the OptionHandler.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.core.tokenizers.Tokenizer
        Returns:
        the list of current option settings as an array of strings
      • getDefaultPreTokenizer

        protected weka.core.tokenizers.Tokenizer getDefaultPreTokenizer()
        Returns the default (pre) token tokenizer.
        Returns:
        the default
      • setPreTokenizer

        public void setPreTokenizer​(weka.core.tokenizers.Tokenizer value)
        Sets the tokenizer to use for the initial tokenization (before cleaning).
        Parameters:
        value - the tokenizer
      • getPreTokenizer

        public weka.core.tokenizers.Tokenizer getPreTokenizer()
        Returns the tokenizer to use for the initial tokenization (before cleaning).
        Returns:
        the tokenizer
      • preTokenizerTipText

        public String preTokenizerTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getDefaultCleaner

        protected TokenCleaner getDefaultCleaner()
        Returns the default cleaner.
        Returns:
        the default
      • setCleaner

        public void setCleaner​(TokenCleaner value)
        Sets the cleaner to use for cleaning the tokens from the initial tokenization.
        Parameters:
        value - the cleaner
      • getCleaner

        public TokenCleaner getCleaner()
        Returns the cleaner to use for cleaning the tokens from the initial tokenization.
        Returns:
        the cleaner
      • cleanerTipText

        public String cleanerTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getDefaultPostTokenizer

        protected weka.core.tokenizers.Tokenizer getDefaultPostTokenizer()
        Returns the default (post) token tokenizer.
        Returns:
        the default
      • setPostTokenizer

        public void setPostTokenizer​(weka.core.tokenizers.Tokenizer value)
        Sets the tokenizer to use for the final tokenization (after cleaning).
        Parameters:
        value - the tokenizer
      • getPostTokenizer

        public weka.core.tokenizers.Tokenizer getPostTokenizer()
        Returns the tokenizer to use for the final tokenization (after cleaning).
        Returns:
        the tokenizer
      • postTokenizerTipText

        public String postTokenizerTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • hasMoreElements

        public boolean hasMoreElements()
        Tests if this enumeration contains more elements.
        Specified by:
        hasMoreElements in interface Enumeration<String>
        Specified by:
        hasMoreElements in class weka.core.tokenizers.Tokenizer
        Returns:
        true if and only if this enumeration object contains at least one more element to provide; false otherwise.
      • nextElement

        public String nextElement()
        Returns the next element of this enumeration if this enumeration object has at least one more element to provide.
        Specified by:
        nextElement in interface Enumeration<String>
        Specified by:
        nextElement in class weka.core.tokenizers.Tokenizer
        Returns:
        the next element of this enumeration.
      • tokenize

        public void tokenize​(String s)
        Sets the string to tokenize. Tokenization happens immediately.
        Specified by:
        tokenize in class weka.core.tokenizers.Tokenizer
        Parameters:
        s - the string to tokenize
      • getRevision

        public String getRevision()
        Returns the revision string.
        Returns:
        the revision