Class WordFrequencyAnalyzer

  • All Implemented Interfaces:
    adams.core.AdditionalInformationHandler, adams.core.CleanUpHandler, adams.core.Destroyable, adams.core.GlobalInfoSupporter, adams.core.io.EncodingSupporter, adams.core.logging.LoggingLevelHandler, adams.core.logging.LoggingSupporter, adams.core.option.OptionHandler, adams.core.QuickInfoSupporter, adams.core.ShallowCopySupporter<adams.flow.core.Actor>, adams.core.SizeOfHandler, adams.core.Stoppable, adams.core.StoppableWithFeedback, adams.core.VariablesInspectionHandler, adams.event.VariableChangeListener, adams.flow.core.Actor, adams.flow.core.ErrorHandler, adams.flow.core.InputConsumer, adams.flow.core.OutputProducer, Serializable, Comparable

    public class WordFrequencyAnalyzer
    extends adams.flow.transformer.AbstractTransformer
    implements adams.core.io.EncodingSupporter
    Generates a word frequency analyzer from the incoming text.

    Input/output:
    - accepts:
       java.lang.String
    - generates:
       com.kennycason.kumo.WordFrequency[]


    -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel)
        The logging level for outputting errors and debugging output.
        default: WARNING
     
    -name <java.lang.String> (property: name)
        The name of the actor.
        default: WordFrequencyAnalyzer
     
    -annotation <adams.core.base.BaseAnnotation> (property: annotations)
        The annotations to attach to this actor.
        default:
     
    -skip <boolean> (property: skip)
        If set to true, transformation is skipped and the input token is just forwarded
        as it is.
        default: false
     
    -stop-flow-on-error <boolean> (property: stopFlowOnError)
        If set to true, the flow execution at this level gets stopped in case this
        actor encounters an error; the error gets propagated; useful for critical
        actors.
        default: false
     
    -silent <boolean> (property: silent)
        If enabled, then no errors are output in the console; Note: the enclosing
        actor handler must have this enabled as well.
        default: false
     
    -encoding <adams.core.base.BaseCharset> (property: encoding)
        The type of encoding to use when writing to the file, use empty string for
        default.
        default: Default
     
    -normalizer <com.kennycason.kumo.nlp.normalize.Normalizer> [-normalizer ...] (property: normalizers)
        The normalizers to use.
        default:
     
    -min-word-length <int> (property: minWordLength)
        The minimum length for words.
        default: 3
        minimum: 1
     
    -max-word-length <int> (property: maxWordLength)
        The maximum length for words.
        default: 32
        minimum: 1
     
    -num-frequencies <int> (property: numFrequencies)
        The number of frequencies to return.
        default: 50
        minimum: 1
     
    -stopwords <adams.flow.control.StorageName> (property: stopwords)
        The storage item that holds the string array of stopwords to use.
        default: storage
     
    Author:
    FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected adams.core.base.BaseCharset m_Encoding
      the encoding to use.
      protected int m_MaxWordLength
      the max word length.
      protected int m_MinWordLength
      the min word length.
      protected com.kennycason.kumo.nlp.normalize.Normalizer[] m_Normalizers
      the normalizers to use.
      protected int m_NumFrequencies
      the number of requencies to return.
      protected adams.flow.control.StorageName m_Stopwords
      the stopwords to retrieve from storage.
      • Fields inherited from class adams.flow.transformer.AbstractTransformer

        BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
      • Fields inherited from class adams.flow.core.AbstractActor

        m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_Executing, m_ExecutionListeningSupporter, m_FullName, m_LoggingPrefix, m_Name, m_Parent, m_ScopeHandler, m_Self, m_Silent, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
      • Fields inherited from class adams.core.option.AbstractOptionHandler

        m_OptionManager
      • Fields inherited from class adams.core.logging.LoggingObject

        m_Logger, m_LoggingIsEnabled, m_LoggingLevel
      • Fields inherited from interface adams.flow.core.Actor

        FILE_EXTENSION, FILE_EXTENSION_GZ
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Class[] accepts()
      Returns the class that the consumer accepts.
      void defineOptions()
      Adds options to the internal list of options.
      protected String doExecute()
      Executes the flow item.
      String encodingTipText()
      Returns the tip text for this property.
      Class[] generates()
      Returns the class of objects that it generates.
      adams.core.base.BaseCharset getEncoding()
      Returns the encoding to use.
      int getMaxWordLength()
      Returns the maximum length for words.
      int getMinWordLength()
      Returns the minimum length for words.
      com.kennycason.kumo.nlp.normalize.Normalizer[] getNormalizers()
      Returns the normalizers to use.
      int getNumFrequencies()
      Returns the number of frequencies to return.
      String getQuickInfo()
      Returns a quick info about the object, which can be displayed in the GUI.
      adams.flow.control.StorageName getStopwords()
      Returns the storage item that holds the string array of stopwords to use.
      String globalInfo()
      Returns a string describing the object.
      String maxWordLengthTipText()
      Returns the tip text for this property.
      String minWordLengthTipText()
      Returns the tip text for this property.
      String normalizersTipText()
      Returns the tip text for this property.
      String numFrequenciesTipText()
      Returns the tip text for this property.
      void setEncoding​(adams.core.base.BaseCharset value)
      Sets the encoding to use.
      void setMaxWordLength​(int value)
      Sets the maximum length for words.
      void setMinWordLength​(int value)
      Sets the minimum length for words.
      void setNormalizers​(com.kennycason.kumo.nlp.normalize.Normalizer[] value)
      Sets the normalizers to use.
      void setNumFrequencies​(int value)
      Sets the number of frequencies to return.
      void setStopwords​(adams.flow.control.StorageName value)
      Sets the storage item that holds the string array of stopwords to use.
      String stopwordsTipText()
      Returns the tip text for this property.
      • Methods inherited from class adams.flow.transformer.AbstractTransformer

        backupState, currentInput, execute, hasInput, hasPendingOutput, input, output, postExecute, restoreState, wrapUp
      • Methods inherited from class adams.flow.core.AbstractActor

        annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, configureLogger, destroy, equals, finalUpdateVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, handleException, hasErrorHandler, hasStopMessage, index, initialize, isBackedUp, isExecuted, isExecuting, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, performVariableChecks, preExecute, pruneBackup, pruneBackup, reset, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, silentTipText, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, updateVariables, variableChanged
      • Methods inherited from class adams.core.option.AbstractOptionHandler

        cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
      • Methods inherited from class adams.core.logging.LoggingObject

        getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled
      • Methods inherited from interface adams.flow.core.Actor

        cleanUp, compareTo, destroy, equals, findVariables, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isExecuted, isFinished, isHeadless, isStopped, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, sizeOf, stopExecution, stopExecution, toCommandLine, variableChanged
      • Methods inherited from interface adams.core.AdditionalInformationHandler

        getAdditionalInformation
      • Methods inherited from interface adams.core.logging.LoggingLevelHandler

        getLoggingLevel, setLoggingLevel
      • Methods inherited from interface adams.core.logging.LoggingSupporter

        getLogger, isLoggingEnabled
      • Methods inherited from interface adams.core.option.OptionHandler

        cleanUpOptions, getOptionManager
      • Methods inherited from interface adams.core.VariablesInspectionHandler

        canInspectOptions
    • Field Detail

      • m_Encoding

        protected adams.core.base.BaseCharset m_Encoding
        the encoding to use.
      • m_Normalizers

        protected com.kennycason.kumo.nlp.normalize.Normalizer[] m_Normalizers
        the normalizers to use.
      • m_MinWordLength

        protected int m_MinWordLength
        the min word length.
      • m_MaxWordLength

        protected int m_MaxWordLength
        the max word length.
      • m_NumFrequencies

        protected int m_NumFrequencies
        the number of requencies to return.
      • m_Stopwords

        protected adams.flow.control.StorageName m_Stopwords
        the stopwords to retrieve from storage.
    • Constructor Detail

      • WordFrequencyAnalyzer

        public WordFrequencyAnalyzer()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the object.
        Specified by:
        globalInfo in interface adams.core.GlobalInfoSupporter
        Specified by:
        globalInfo in class adams.core.option.AbstractOptionHandler
        Returns:
        a description suitable for displaying in the gui
      • defineOptions

        public void defineOptions()
        Adds options to the internal list of options.
        Specified by:
        defineOptions in interface adams.core.option.OptionHandler
        Overrides:
        defineOptions in class adams.flow.core.AbstractActor
      • setEncoding

        public void setEncoding​(adams.core.base.BaseCharset value)
        Sets the encoding to use.
        Specified by:
        setEncoding in interface adams.core.io.EncodingSupporter
        Parameters:
        value - the encoding, e.g. "UTF-8" or "UTF-16", empty string for default
      • getEncoding

        public adams.core.base.BaseCharset getEncoding()
        Returns the encoding to use.
        Specified by:
        getEncoding in interface adams.core.io.EncodingSupporter
        Returns:
        the encoding, e.g. "UTF-8" or "UTF-16", empty string for default
      • encodingTipText

        public String encodingTipText()
        Returns the tip text for this property.
        Specified by:
        encodingTipText in interface adams.core.io.EncodingSupporter
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setNormalizers

        public void setNormalizers​(com.kennycason.kumo.nlp.normalize.Normalizer[] value)
        Sets the normalizers to use.
        Parameters:
        value - the normalizers
      • getNormalizers

        public com.kennycason.kumo.nlp.normalize.Normalizer[] getNormalizers()
        Returns the normalizers to use.
        Returns:
        the normalizers
      • normalizersTipText

        public String normalizersTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setMinWordLength

        public void setMinWordLength​(int value)
        Sets the minimum length for words.
        Parameters:
        value - the minimum
      • getMinWordLength

        public int getMinWordLength()
        Returns the minimum length for words.
        Returns:
        the minimum
      • minWordLengthTipText

        public String minWordLengthTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setMaxWordLength

        public void setMaxWordLength​(int value)
        Sets the maximum length for words.
        Parameters:
        value - the maximum
      • getMaxWordLength

        public int getMaxWordLength()
        Returns the maximum length for words.
        Returns:
        the maximum
      • maxWordLengthTipText

        public String maxWordLengthTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setNumFrequencies

        public void setNumFrequencies​(int value)
        Sets the number of frequencies to return.
        Parameters:
        value - the number of frequencies
      • getNumFrequencies

        public int getNumFrequencies()
        Returns the number of frequencies to return.
        Returns:
        the number of frequencies
      • numFrequenciesTipText

        public String numFrequenciesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setStopwords

        public void setStopwords​(adams.flow.control.StorageName value)
        Sets the storage item that holds the string array of stopwords to use.
        Parameters:
        value - the storage name
      • getStopwords

        public adams.flow.control.StorageName getStopwords()
        Returns the storage item that holds the string array of stopwords to use.
        Returns:
        the storage name
      • stopwordsTipText

        public String stopwordsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getQuickInfo

        public String getQuickInfo()
        Returns a quick info about the object, which can be displayed in the GUI.
        Specified by:
        getQuickInfo in interface adams.flow.core.Actor
        Specified by:
        getQuickInfo in interface adams.core.QuickInfoSupporter
        Overrides:
        getQuickInfo in class adams.flow.core.AbstractActor
        Returns:
        null if no info available, otherwise short string
      • accepts

        public Class[] accepts()
        Returns the class that the consumer accepts.
        Specified by:
        accepts in interface adams.flow.core.InputConsumer
        Returns:
        the Class of objects that can be processed
      • generates

        public Class[] generates()
        Returns the class of objects that it generates.
        Specified by:
        generates in interface adams.flow.core.OutputProducer
        Returns:
        the Class of the generated tokens
      • doExecute

        protected String doExecute()
        Executes the flow item.
        Specified by:
        doExecute in class adams.flow.core.AbstractActor
        Returns:
        null if everything is fine, otherwise error message