Package adams.flow.transformer
Class WordFrequencyAnalyzer
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.flow.core.AbstractActor
-
- adams.flow.transformer.AbstractTransformer
-
- adams.flow.transformer.WordFrequencyAnalyzer
-
- All Implemented Interfaces:
adams.core.AdditionalInformationHandler
,adams.core.CleanUpHandler
,adams.core.Destroyable
,adams.core.GlobalInfoSupporter
,adams.core.io.EncodingSupporter
,adams.core.logging.LoggingLevelHandler
,adams.core.logging.LoggingSupporter
,adams.core.option.OptionHandler
,adams.core.QuickInfoSupporter
,adams.core.ShallowCopySupporter<adams.flow.core.Actor>
,adams.core.SizeOfHandler
,adams.core.Stoppable
,adams.core.StoppableWithFeedback
,adams.core.VariablesInspectionHandler
,adams.event.VariableChangeListener
,adams.flow.core.Actor
,adams.flow.core.ErrorHandler
,adams.flow.core.InputConsumer
,adams.flow.core.OutputProducer
,Serializable
,Comparable
public class WordFrequencyAnalyzer extends adams.flow.transformer.AbstractTransformer implements adams.core.io.EncodingSupporter
Generates a word frequency analyzer from the incoming text.
Input/output:
- accepts:
java.lang.String
- generates:
com.kennycason.kumo.WordFrequency[]
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-name <java.lang.String> (property: name) The name of the actor. default: WordFrequencyAnalyzer
-annotation <adams.core.base.BaseAnnotation> (property: annotations) The annotations to attach to this actor. default:
-skip <boolean> (property: skip) If set to true, transformation is skipped and the input token is just forwarded as it is. default: false
-stop-flow-on-error <boolean> (property: stopFlowOnError) If set to true, the flow execution at this level gets stopped in case this actor encounters an error; the error gets propagated; useful for critical actors. default: false
-silent <boolean> (property: silent) If enabled, then no errors are output in the console; Note: the enclosing actor handler must have this enabled as well. default: false
-encoding <adams.core.base.BaseCharset> (property: encoding) The type of encoding to use when writing to the file, use empty string for default. default: Default
-normalizer <com.kennycason.kumo.nlp.normalize.Normalizer> [-normalizer ...] (property: normalizers) The normalizers to use. default:
-min-word-length <int> (property: minWordLength) The minimum length for words. default: 3 minimum: 1
-max-word-length <int> (property: maxWordLength) The maximum length for words. default: 32 minimum: 1
-num-frequencies <int> (property: numFrequencies) The number of frequencies to return. default: 50 minimum: 1
-stopwords <adams.flow.control.StorageName> (property: stopwords) The storage item that holds the string array of stopwords to use. default: storage
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected adams.core.base.BaseCharset
m_Encoding
the encoding to use.protected int
m_MaxWordLength
the max word length.protected int
m_MinWordLength
the min word length.protected com.kennycason.kumo.nlp.normalize.Normalizer[]
m_Normalizers
the normalizers to use.protected int
m_NumFrequencies
the number of requencies to return.protected adams.flow.control.StorageName
m_Stopwords
the stopwords to retrieve from storage.-
Fields inherited from class adams.flow.transformer.AbstractTransformer
BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
-
Fields inherited from class adams.flow.core.AbstractActor
m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_Executing, m_ExecutionListeningSupporter, m_FullName, m_LoggingPrefix, m_Name, m_Parent, m_ScopeHandler, m_Self, m_Silent, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
-
-
Constructor Summary
Constructors Constructor Description WordFrequencyAnalyzer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Class[]
accepts()
Returns the class that the consumer accepts.void
defineOptions()
Adds options to the internal list of options.protected String
doExecute()
Executes the flow item.String
encodingTipText()
Returns the tip text for this property.Class[]
generates()
Returns the class of objects that it generates.adams.core.base.BaseCharset
getEncoding()
Returns the encoding to use.int
getMaxWordLength()
Returns the maximum length for words.int
getMinWordLength()
Returns the minimum length for words.com.kennycason.kumo.nlp.normalize.Normalizer[]
getNormalizers()
Returns the normalizers to use.int
getNumFrequencies()
Returns the number of frequencies to return.String
getQuickInfo()
Returns a quick info about the object, which can be displayed in the GUI.adams.flow.control.StorageName
getStopwords()
Returns the storage item that holds the string array of stopwords to use.String
globalInfo()
Returns a string describing the object.String
maxWordLengthTipText()
Returns the tip text for this property.String
minWordLengthTipText()
Returns the tip text for this property.String
normalizersTipText()
Returns the tip text for this property.String
numFrequenciesTipText()
Returns the tip text for this property.void
setEncoding(adams.core.base.BaseCharset value)
Sets the encoding to use.void
setMaxWordLength(int value)
Sets the maximum length for words.void
setMinWordLength(int value)
Sets the minimum length for words.void
setNormalizers(com.kennycason.kumo.nlp.normalize.Normalizer[] value)
Sets the normalizers to use.void
setNumFrequencies(int value)
Sets the number of frequencies to return.void
setStopwords(adams.flow.control.StorageName value)
Sets the storage item that holds the string array of stopwords to use.String
stopwordsTipText()
Returns the tip text for this property.-
Methods inherited from class adams.flow.transformer.AbstractTransformer
backupState, currentInput, execute, hasInput, hasPendingOutput, input, output, postExecute, restoreState, wrapUp
-
Methods inherited from class adams.flow.core.AbstractActor
annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, configureLogger, destroy, equals, finalUpdateVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, handleException, hasErrorHandler, hasStopMessage, index, initialize, isBackedUp, isExecuted, isExecuting, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, performVariableChecks, preExecute, pruneBackup, pruneBackup, reset, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, silentTipText, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, updateVariables, variableChanged
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.flow.core.Actor
cleanUp, compareTo, destroy, equals, findVariables, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isExecuted, isFinished, isHeadless, isStopped, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setUp, setVariables, shallowCopy, shallowCopy, sizeOf, stopExecution, stopExecution, toCommandLine, variableChanged
-
-
-
-
Field Detail
-
m_Encoding
protected adams.core.base.BaseCharset m_Encoding
the encoding to use.
-
m_Normalizers
protected com.kennycason.kumo.nlp.normalize.Normalizer[] m_Normalizers
the normalizers to use.
-
m_MinWordLength
protected int m_MinWordLength
the min word length.
-
m_MaxWordLength
protected int m_MaxWordLength
the max word length.
-
m_NumFrequencies
protected int m_NumFrequencies
the number of requencies to return.
-
m_Stopwords
protected adams.flow.control.StorageName m_Stopwords
the stopwords to retrieve from storage.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceadams.core.GlobalInfoSupporter
- Specified by:
globalInfo
in classadams.core.option.AbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceadams.core.option.OptionHandler
- Overrides:
defineOptions
in classadams.flow.core.AbstractActor
-
setEncoding
public void setEncoding(adams.core.base.BaseCharset value)
Sets the encoding to use.- Specified by:
setEncoding
in interfaceadams.core.io.EncodingSupporter
- Parameters:
value
- the encoding, e.g. "UTF-8" or "UTF-16", empty string for default
-
getEncoding
public adams.core.base.BaseCharset getEncoding()
Returns the encoding to use.- Specified by:
getEncoding
in interfaceadams.core.io.EncodingSupporter
- Returns:
- the encoding, e.g. "UTF-8" or "UTF-16", empty string for default
-
encodingTipText
public String encodingTipText()
Returns the tip text for this property.- Specified by:
encodingTipText
in interfaceadams.core.io.EncodingSupporter
- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNormalizers
public void setNormalizers(com.kennycason.kumo.nlp.normalize.Normalizer[] value)
Sets the normalizers to use.- Parameters:
value
- the normalizers
-
getNormalizers
public com.kennycason.kumo.nlp.normalize.Normalizer[] getNormalizers()
Returns the normalizers to use.- Returns:
- the normalizers
-
normalizersTipText
public String normalizersTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setMinWordLength
public void setMinWordLength(int value)
Sets the minimum length for words.- Parameters:
value
- the minimum
-
getMinWordLength
public int getMinWordLength()
Returns the minimum length for words.- Returns:
- the minimum
-
minWordLengthTipText
public String minWordLengthTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setMaxWordLength
public void setMaxWordLength(int value)
Sets the maximum length for words.- Parameters:
value
- the maximum
-
getMaxWordLength
public int getMaxWordLength()
Returns the maximum length for words.- Returns:
- the maximum
-
maxWordLengthTipText
public String maxWordLengthTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumFrequencies
public void setNumFrequencies(int value)
Sets the number of frequencies to return.- Parameters:
value
- the number of frequencies
-
getNumFrequencies
public int getNumFrequencies()
Returns the number of frequencies to return.- Returns:
- the number of frequencies
-
numFrequenciesTipText
public String numFrequenciesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setStopwords
public void setStopwords(adams.flow.control.StorageName value)
Sets the storage item that holds the string array of stopwords to use.- Parameters:
value
- the storage name
-
getStopwords
public adams.flow.control.StorageName getStopwords()
Returns the storage item that holds the string array of stopwords to use.- Returns:
- the storage name
-
stopwordsTipText
public String stopwordsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getQuickInfo
public String getQuickInfo()
Returns a quick info about the object, which can be displayed in the GUI.- Specified by:
getQuickInfo
in interfaceadams.flow.core.Actor
- Specified by:
getQuickInfo
in interfaceadams.core.QuickInfoSupporter
- Overrides:
getQuickInfo
in classadams.flow.core.AbstractActor
- Returns:
- null if no info available, otherwise short string
-
accepts
public Class[] accepts()
Returns the class that the consumer accepts.- Specified by:
accepts
in interfaceadams.flow.core.InputConsumer
- Returns:
- the Class of objects that can be processed
-
generates
public Class[] generates()
Returns the class of objects that it generates.- Specified by:
generates
in interfaceadams.flow.core.OutputProducer
- Returns:
- the Class of the generated tokens
-
doExecute
protected String doExecute()
Executes the flow item.- Specified by:
doExecute
in classadams.flow.core.AbstractActor
- Returns:
- null if everything is fine, otherwise error message
-
-