Package adams.flow.transformer
Class WekaInstanceDumper
-
- All Implemented Interfaces:
AdditionalInformationHandler,BufferSupporter,CleanUpHandler,Destroyable,GlobalInfoSupporter,LoggingLevelHandler,LoggingSupporter,OptionHandler,QuickInfoSupporter,ShallowCopySupporter<Actor>,SizeOfHandler,Stoppable,StoppableWithFeedback,VariablesInspectionHandler,VariableChangeListener,Actor,ErrorHandler,FlushSupporter,InputConsumer,OutputProducer,Serializable,Comparable
public class WekaInstanceDumper extends AbstractTransformer implements BufferSupporter, FlushSupporter
Dumps weka.core.Instance objects into an ARFF file. If the headers change and the header-check is enabled, then a new file will be used.
The actor can also turn double arrays into weka.core.Instance objects (all attributes are assumed to be numeric).
Input/output:
- accepts:
weka.core.Instance
double[]
- generates:
java.lang.String
Valid options are:
-D <int> (property: debugLevel) The greater the number the more additional info the scheme may output to the console (0 = off). default: 0 minimum: 0
-name <java.lang.String> (property: name) The name of the actor. default: WekaInstanceDumper
-annotation <adams.core.base.BaseText> (property: annotations) The annotations to attach to this actor. default:
-skip (property: skip) If set to true, transformation is skipped and the input token is just forwarded as it is.
-stop-flow-on-error (property: stopFlowOnError) If set to true, the flow gets stopped in case this actor encounters an error; useful for critical actors.
-check (property: checkHeader) Whether to check the headers - if the headers change, the Instance object gets dumped into a new file.
-prefix <adams.core.io.PlaceholderFile> (property: outputPrefix) The path and partial filename of the output file; automatically removes ' arff' and 'csv' extensions, as they get added automatically. default: ${CWD}-format <ARFF|CSV|TAB> (property: outputFormat) The format to output the data in. default: ARFF
-use-relation (property: useRelationNameAsFilename) If set to true, then the relation name replaces the name of the output file; eg if the output file is '/some/where/file.arff' and the relation is 'anneal' then the resulting file name will be '/some/where/anneal.arff'.
-keep-existing (property: keepExisting) If enabled, any output file that exists when the actor is executed for the first time (or variables modify the actor) won't get replaced with the current header; useful when outputting data in multiple locations in the flow, but one needs to be cautious as to not stored mixed content (eg varying number of attributes, etc).
-buffer-size <int> (property: bufferSize) The number of instances to buffer before writing to disk, in order to improve I/O performance. default: 1 minimum: 1
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classWekaInstanceDumper.OutputFormatThe format to output the data in.
-
Field Summary
Fields Modifier and Type Field Description static StringBACKUP_BUFFERthe key for storing the buffer in the backup.static StringBACKUP_COUNTERthe key for storing the counter in the backup.static StringBACKUP_HEADERthe key for storing the header in the backup.protected List<weka.core.Instance>m_Bufferthe buffer.protected intm_BufferSizethe size of the buffer.protected booleanm_CheckHeaderwhether to check the header.protected intm_Counterthe counter for the filenames.protected weka.core.Instancesm_Headerthe header of the dataset.protected booleanm_KeepExistingwhether to keep existing output files when actor is called for the first time, in order to allow appending to files from multiple locations in flow.protected WekaInstanceDumper.OutputFormatm_OutputFormatthe output format.protected PlaceholderFilem_OutputPrefixthe output prefix.protected booleanm_UseRelationNameAsFilenamewhether to use the relation name as filename.protected booleanm_Writingwhether currently writing to disk.-
Fields inherited from class adams.flow.transformer.AbstractTransformer
BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
-
Fields inherited from class adams.flow.core.AbstractActor
m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_Executing, m_ExecutionListeningSupporter, m_FullName, m_LoggingPrefix, m_Name, m_Parent, m_ScopeHandler, m_Self, m_Silent, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
Fields inherited from interface adams.flow.core.Actor
FILE_EXTENSION, FILE_EXTENSION_GZ
-
-
Constructor Summary
Constructors Constructor Description WekaInstanceDumper()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Class[]accepts()Returns the class that the consumer accepts.protected Hashtable<String,Object>backupState()Backs up the current state of the actor before update the variables.StringbufferSizeTipText()Returns the tip text for this property.StringcheckHeaderTipText()Returns the tip text for this property.protected FilecreateFilename(weka.core.Instances header)Generates the filename for the output.protected StringcreateHeader(weka.core.Instances header)Turns the dataset header into the appropriate format.protected StringcreateRow(weka.core.Instance row)Turns the row into the appropriate format.voiddefineOptions()Adds options to the internal list of options.protected StringdoExecute()Executes the flow item.Class[]generates()Returns the class of objects that it generates.intgetBufferSize()Returns the number of instances to buffer before writing them to disk.booleangetCheckHeader()Returns whether the header gets checked or not.booleangetKeepExisting()Returns whether any existing file is kept on first execution.WekaInstanceDumper.OutputFormatgetOutputFormat()Returns the current output format.PlaceholderFilegetOutputPrefix()Returns the current output prefix.StringgetQuickInfo()Returns a quick info about the actor, which will be displayed in the GUI.booleangetUseRelationNameAsFilename()Returns whether the relation name is used as filename.StringglobalInfo()Returns a string describing the object.protected voidinitialize()Initializes the members.StringkeepExistingTipText()Returns the tip text for this property.StringoutputFormatTipText()Returns the tip text for this property.StringoutputPrefixTipText()Returns the tip text for this property.voidperformFlush()Performs the flush.protected voidpruneBackup()Removes entries from the backup.protected voidreset()Resets the scheme.protected voidrestoreState(Hashtable<String,Object> state)Restores the state of the actor before the variables got updated.voidsetBufferSize(int value)Sets the number of instances to buffer before writing them to disk.voidsetCheckHeader(boolean value)Sets whether to check the header or not.voidsetKeepExisting(boolean value)Sets whether to keep any existing file on first execution.voidsetOutputFormat(WekaInstanceDumper.OutputFormat value)Sets the output format.voidsetOutputPrefix(PlaceholderFile value)Sets the prefix for the output (path + partial filename).StringsetUp()Initializes the item for flow execution.voidsetUseRelationNameAsFilename(boolean value)Sets whether to use the relation name as filename instead.protected StringupdateVariables()Gets called when the actor needs to be re-setUp when a variable changes.StringuseRelationNameAsFilenameTipText()Returns the tip text for this property.voidwrapUp()Cleans up after the execution has finished.protected StringwriteToDisk(boolean append)Writes the content of the buffer to disk.-
Methods inherited from class adams.flow.transformer.AbstractTransformer
currentInput, execute, hasInput, hasPendingOutput, input, output, postExecute
-
Methods inherited from class adams.flow.core.AbstractActor
annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, configureLogger, destroy, equals, finalUpdateVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, handleException, hasErrorHandler, hasStopMessage, index, isBackedUp, isExecuted, isExecuting, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, performVariableChecks, preExecute, pruneBackup, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setVariables, shallowCopy, shallowCopy, silentTipText, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, variableChanged
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.flow.core.Actor
cleanUp, compareTo, destroy, equals, execute, findVariables, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isExecuted, isFinished, isHeadless, isStopped, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setVariables, shallowCopy, shallowCopy, sizeOf, stopExecution, stopExecution, toCommandLine, variableChanged
-
Methods inherited from interface adams.core.AdditionalInformationHandler
getAdditionalInformation
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel, setLoggingLevel
-
Methods inherited from interface adams.core.logging.LoggingSupporter
getLogger, isLoggingEnabled
-
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager
-
Methods inherited from interface adams.core.VariablesInspectionHandler
canInspectOptions
-
-
-
-
Field Detail
-
BACKUP_HEADER
public static final String BACKUP_HEADER
the key for storing the header in the backup.- See Also:
- Constant Field Values
-
BACKUP_COUNTER
public static final String BACKUP_COUNTER
the key for storing the counter in the backup.- See Also:
- Constant Field Values
-
BACKUP_BUFFER
public static final String BACKUP_BUFFER
the key for storing the buffer in the backup.- See Also:
- Constant Field Values
-
m_Header
protected weka.core.Instances m_Header
the header of the dataset.
-
m_Counter
protected int m_Counter
the counter for the filenames.
-
m_CheckHeader
protected boolean m_CheckHeader
whether to check the header.
-
m_OutputPrefix
protected PlaceholderFile m_OutputPrefix
the output prefix.
-
m_OutputFormat
protected WekaInstanceDumper.OutputFormat m_OutputFormat
the output format.
-
m_UseRelationNameAsFilename
protected boolean m_UseRelationNameAsFilename
whether to use the relation name as filename.
-
m_KeepExisting
protected boolean m_KeepExisting
whether to keep existing output files when actor is called for the first time, in order to allow appending to files from multiple locations in flow.
-
m_BufferSize
protected int m_BufferSize
the size of the buffer.
-
m_Buffer
protected List<weka.core.Instance> m_Buffer
the buffer.
-
m_Writing
protected boolean m_Writing
whether currently writing to disk.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfoin interfaceGlobalInfoSupporter- Specified by:
globalInfoin classAbstractOptionHandler- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptionsin interfaceOptionHandler- Overrides:
defineOptionsin classAbstractActor
-
initialize
protected void initialize()
Initializes the members.- Overrides:
initializein classAbstractActor
-
getQuickInfo
public String getQuickInfo()
Returns a quick info about the actor, which will be displayed in the GUI.- Specified by:
getQuickInfoin interfaceActor- Specified by:
getQuickInfoin interfaceQuickInfoSupporter- Overrides:
getQuickInfoin classAbstractActor- Returns:
- null if no info available, otherwise short string
-
setCheckHeader
public void setCheckHeader(boolean value)
Sets whether to check the header or not.- Parameters:
value- if true then the headers get checked
-
getCheckHeader
public boolean getCheckHeader()
Returns whether the header gets checked or not.- Returns:
- true if the header gets checked
-
checkHeaderTipText
public String checkHeaderTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setOutputPrefix
public void setOutputPrefix(PlaceholderFile value)
Sets the prefix for the output (path + partial filename). Automatically removes .arff or .csv extensions from the partial file name since they get added automatically.- Parameters:
value- the prefix
-
getOutputPrefix
public PlaceholderFile getOutputPrefix()
Returns the current output prefix.- Returns:
- the prefix
-
outputPrefixTipText
public String outputPrefixTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setOutputFormat
public void setOutputFormat(WekaInstanceDumper.OutputFormat value)
Sets the output format.- Parameters:
value- the format
-
getOutputFormat
public WekaInstanceDumper.OutputFormat getOutputFormat()
Returns the current output format.- Returns:
- the format
-
outputFormatTipText
public String outputFormatTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setUseRelationNameAsFilename
public void setUseRelationNameAsFilename(boolean value)
Sets whether to use the relation name as filename instead.- Parameters:
value- if true then the relation name will be used
-
getUseRelationNameAsFilename
public boolean getUseRelationNameAsFilename()
Returns whether the relation name is used as filename.- Returns:
- true if the relation name is used
-
useRelationNameAsFilenameTipText
public String useRelationNameAsFilenameTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setKeepExisting
public void setKeepExisting(boolean value)
Sets whether to keep any existing file on first execution.- Parameters:
value- if true then existing file is kept
-
getKeepExisting
public boolean getKeepExisting()
Returns whether any existing file is kept on first execution.- Returns:
- true if existing file is kept
-
keepExistingTipText
public String keepExistingTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setBufferSize
public void setBufferSize(int value)
Sets the number of instances to buffer before writing them to disk.- Specified by:
setBufferSizein interfaceBufferSupporter- Parameters:
value- the number of instances to buffer
-
getBufferSize
public int getBufferSize()
Returns the number of instances to buffer before writing them to disk.- Specified by:
getBufferSizein interfaceBufferSupporter- Returns:
- the number of intances to buffer
-
bufferSizeTipText
public String bufferSizeTipText()
Returns the tip text for this property.- Specified by:
bufferSizeTipTextin interfaceBufferSupporter- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
pruneBackup
protected void pruneBackup()
Removes entries from the backup.- Overrides:
pruneBackupin classAbstractActor- See Also:
reset()
-
backupState
protected Hashtable<String,Object> backupState()
Backs up the current state of the actor before update the variables.- Overrides:
backupStatein classAbstractTransformer- Returns:
- the backup
- See Also:
AbstractActor.updateVariables(),AbstractActor.restoreState(Hashtable)
-
restoreState
protected void restoreState(Hashtable<String,Object> state)
Restores the state of the actor before the variables got updated.- Overrides:
restoreStatein classAbstractTransformer- Parameters:
state- the backup of the state to restore from- See Also:
AbstractActor.updateVariables(),AbstractActor.backupState()
-
reset
protected void reset()
Resets the scheme.- Overrides:
resetin classAbstractActor
-
accepts
public Class[] accepts()
Returns the class that the consumer accepts.- Specified by:
acceptsin interfaceInputConsumer- Returns:
- weka.core.Instance.class, double[].class
-
generates
public Class[] generates()
Returns the class of objects that it generates.- Specified by:
generatesin interfaceOutputProducer- Returns:
- java.lang.String.class
-
setUp
public String setUp()
Initializes the item for flow execution. Also calls the reset() method first before anything else.- Specified by:
setUpin interfaceActor- Overrides:
setUpin classAbstractActor- Returns:
- null if everything is fine, otherwise error message
- See Also:
AbstractActor.reset()
-
createFilename
protected File createFilename(weka.core.Instances header)
Generates the filename for the output.- Parameters:
header- the current relation- Returns:
- the generated filename
-
createHeader
protected String createHeader(weka.core.Instances header)
Turns the dataset header into the appropriate format.- Parameters:
header- the header to convert- Returns:
- the generated output
-
createRow
protected String createRow(weka.core.Instance row)
Turns the row into the appropriate format.- Parameters:
row- the row to convert- Returns:
- the generated output
-
writeToDisk
protected String writeToDisk(boolean append)
Writes the content of the buffer to disk.- Parameters:
append- whether to append- Returns:
- error message is something went wrong, null otherwise
-
updateVariables
protected String updateVariables()
Gets called when the actor needs to be re-setUp when a variable changes.- Overrides:
updateVariablesin classAbstractActor- Returns:
- null if everything is fine, otherwise error message
-
doExecute
protected String doExecute()
Executes the flow item.- Specified by:
doExecutein classAbstractActor- Returns:
- null if everything is fine, otherwise error message
-
wrapUp
public void wrapUp()
Cleans up after the execution has finished.- Specified by:
wrapUpin interfaceActor- Overrides:
wrapUpin classAbstractTransformer
-
performFlush
public void performFlush()
Performs the flush.- Specified by:
performFlushin interfaceFlushSupporter
-
-