Class NormalizeDuplicateChars

  • All Implemented Interfaces:
    Serializable, weka.core.OptionHandler, TokenCleaner

    public class NormalizeDuplicateChars
    extends AbstractTokenCleaner
    Replaces all duplicate characters with a single one. Eg 'oooooh noooo!!!!' becomes 'oh no!'.
    Version:
    $Revision$
    Author:
    FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Constructor Detail

      • NormalizeDuplicateChars

        public NormalizeDuplicateChars()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the cleaner.
        Specified by:
        globalInfo in class AbstractTokenCleaner
        Returns:
        a description suitable for displaying in the explorer/experimenter gui
      • clean

        public String clean​(String token)
        Determines whether a token is clean or not.
        Specified by:
        clean in interface TokenCleaner
        Specified by:
        clean in class AbstractTokenCleaner
        Parameters:
        token - the token to check
        Returns:
        the clean token or null to ignore