Package adams.flow.transformer
Class WekaInstanceDumper
-
- All Implemented Interfaces:
AdditionalInformationHandler
,BufferSupporter
,CleanUpHandler
,Destroyable
,GlobalInfoSupporter
,LoggingLevelHandler
,LoggingSupporter
,OptionHandler
,QuickInfoSupporter
,ShallowCopySupporter<Actor>
,SizeOfHandler
,Stoppable
,StoppableWithFeedback
,VariablesInspectionHandler
,VariableChangeListener
,Actor
,ErrorHandler
,FlushSupporter
,InputConsumer
,OutputProducer
,Serializable
,Comparable
public class WekaInstanceDumper extends AbstractTransformer implements BufferSupporter, FlushSupporter
Dumps weka.core.Instance objects into an ARFF file. If the headers change and the header-check is enabled, then a new file will be used.
The actor can also turn double arrays into weka.core.Instance objects (all attributes are assumed to be numeric).
Input/output:
- accepts:
weka.core.Instance
double[]
- generates:
java.lang.String
Valid options are:
-D <int> (property: debugLevel) The greater the number the more additional info the scheme may output to the console (0 = off). default: 0 minimum: 0
-name <java.lang.String> (property: name) The name of the actor. default: WekaInstanceDumper
-annotation <adams.core.base.BaseText> (property: annotations) The annotations to attach to this actor. default:
-skip (property: skip) If set to true, transformation is skipped and the input token is just forwarded as it is.
-stop-flow-on-error (property: stopFlowOnError) If set to true, the flow gets stopped in case this actor encounters an error; useful for critical actors.
-check (property: checkHeader) Whether to check the headers - if the headers change, the Instance object gets dumped into a new file.
-prefix <adams.core.io.PlaceholderFile> (property: outputPrefix) The path and partial filename of the output file; automatically removes ' arff' and 'csv' extensions, as they get added automatically. default: ${CWD}
-format <ARFF|CSV|TAB> (property: outputFormat) The format to output the data in. default: ARFF
-use-relation (property: useRelationNameAsFilename) If set to true, then the relation name replaces the name of the output file; eg if the output file is '/some/where/file.arff' and the relation is 'anneal' then the resulting file name will be '/some/where/anneal.arff'.
-keep-existing (property: keepExisting) If enabled, any output file that exists when the actor is executed for the first time (or variables modify the actor) won't get replaced with the current header; useful when outputting data in multiple locations in the flow, but one needs to be cautious as to not stored mixed content (eg varying number of attributes, etc).
-buffer-size <int> (property: bufferSize) The number of instances to buffer before writing to disk, in order to improve I/O performance. default: 1 minimum: 1
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
WekaInstanceDumper.OutputFormat
The format to output the data in.
-
Field Summary
Fields Modifier and Type Field Description static String
BACKUP_BUFFER
the key for storing the buffer in the backup.static String
BACKUP_COUNTER
the key for storing the counter in the backup.static String
BACKUP_HEADER
the key for storing the header in the backup.protected List<weka.core.Instance>
m_Buffer
the buffer.protected int
m_BufferSize
the size of the buffer.protected boolean
m_CheckHeader
whether to check the header.protected int
m_Counter
the counter for the filenames.protected weka.core.Instances
m_Header
the header of the dataset.protected boolean
m_KeepExisting
whether to keep existing output files when actor is called for the first time, in order to allow appending to files from multiple locations in flow.protected WekaInstanceDumper.OutputFormat
m_OutputFormat
the output format.protected PlaceholderFile
m_OutputPrefix
the output prefix.protected boolean
m_UseRelationNameAsFilename
whether to use the relation name as filename.-
Fields inherited from class adams.flow.transformer.AbstractTransformer
BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
-
Fields inherited from class adams.flow.core.AbstractActor
m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_Executing, m_ExecutionListeningSupporter, m_FullName, m_LoggingPrefix, m_Name, m_Parent, m_ScopeHandler, m_Self, m_Silent, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
Fields inherited from interface adams.flow.core.Actor
FILE_EXTENSION, FILE_EXTENSION_GZ
-
-
Constructor Summary
Constructors Constructor Description WekaInstanceDumper()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Class[]
accepts()
Returns the class that the consumer accepts.protected Hashtable<String,Object>
backupState()
Backs up the current state of the actor before update the variables.String
bufferSizeTipText()
Returns the tip text for this property.String
checkHeaderTipText()
Returns the tip text for this property.protected File
createFilename(weka.core.Instances header)
Generates the filename for the output.protected String
createHeader(weka.core.Instances header)
Turns the dataset header into the appropriate format.protected String
createRow(weka.core.Instance row)
Turns the row into the appropriate format.void
defineOptions()
Adds options to the internal list of options.protected String
doExecute()
Executes the flow item.Class[]
generates()
Returns the class of objects that it generates.int
getBufferSize()
Returns the number of instances to buffer before writing them to disk.boolean
getCheckHeader()
Returns whether the header gets checked or not.boolean
getKeepExisting()
Returns whether any existing file is kept on first execution.WekaInstanceDumper.OutputFormat
getOutputFormat()
Returns the current output format.PlaceholderFile
getOutputPrefix()
Returns the current output prefix.String
getQuickInfo()
Returns a quick info about the actor, which will be displayed in the GUI.boolean
getUseRelationNameAsFilename()
Returns whether the relation name is used as filename.String
globalInfo()
Returns a string describing the object.protected void
initialize()
Initializes the members.String
keepExistingTipText()
Returns the tip text for this property.String
outputFormatTipText()
Returns the tip text for this property.String
outputPrefixTipText()
Returns the tip text for this property.void
performFlush()
Performs the flush.protected void
pruneBackup()
Removes entries from the backup.protected void
reset()
Resets the scheme.protected void
restoreState(Hashtable<String,Object> state)
Restores the state of the actor before the variables got updated.void
setBufferSize(int value)
Sets the number of instances to buffer before writing them to disk.void
setCheckHeader(boolean value)
Sets whether to check the header or not.void
setKeepExisting(boolean value)
Sets whether to keep any existing file on first execution.void
setOutputFormat(WekaInstanceDumper.OutputFormat value)
Sets the output format.void
setOutputPrefix(PlaceholderFile value)
Sets the prefix for the output (path + partial filename).String
setUp()
Initializes the item for flow execution.void
setUseRelationNameAsFilename(boolean value)
Sets whether to use the relation name as filename instead.protected String
updateVariables()
Gets called when the actor needs to be re-setUp when a variable changes.String
useRelationNameAsFilenameTipText()
Returns the tip text for this property.void
wrapUp()
Cleans up after the execution has finished.protected String
writeToDisk(boolean append)
Writes the content of the buffer to disk.-
Methods inherited from class adams.flow.transformer.AbstractTransformer
currentInput, execute, hasInput, hasPendingOutput, input, output, postExecute
-
Methods inherited from class adams.flow.core.AbstractActor
annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, configureLogger, destroy, equals, finalUpdateVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, handleException, hasErrorHandler, hasStopMessage, index, isBackedUp, isExecuted, isExecuting, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, performVariableChecks, preExecute, pruneBackup, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setVariables, shallowCopy, shallowCopy, silentTipText, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, variableChanged
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.flow.core.Actor
cleanUp, compareTo, destroy, equals, execute, findVariables, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isExecuted, isFinished, isHeadless, isStopped, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setVariables, shallowCopy, shallowCopy, sizeOf, stopExecution, stopExecution, toCommandLine, variableChanged
-
Methods inherited from interface adams.core.AdditionalInformationHandler
getAdditionalInformation
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel, setLoggingLevel
-
Methods inherited from interface adams.core.logging.LoggingSupporter
getLogger, isLoggingEnabled
-
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager
-
Methods inherited from interface adams.core.VariablesInspectionHandler
canInspectOptions
-
-
-
-
Field Detail
-
BACKUP_HEADER
public static final String BACKUP_HEADER
the key for storing the header in the backup.- See Also:
- Constant Field Values
-
BACKUP_COUNTER
public static final String BACKUP_COUNTER
the key for storing the counter in the backup.- See Also:
- Constant Field Values
-
BACKUP_BUFFER
public static final String BACKUP_BUFFER
the key for storing the buffer in the backup.- See Also:
- Constant Field Values
-
m_Header
protected weka.core.Instances m_Header
the header of the dataset.
-
m_Counter
protected int m_Counter
the counter for the filenames.
-
m_CheckHeader
protected boolean m_CheckHeader
whether to check the header.
-
m_OutputPrefix
protected PlaceholderFile m_OutputPrefix
the output prefix.
-
m_OutputFormat
protected WekaInstanceDumper.OutputFormat m_OutputFormat
the output format.
-
m_UseRelationNameAsFilename
protected boolean m_UseRelationNameAsFilename
whether to use the relation name as filename.
-
m_KeepExisting
protected boolean m_KeepExisting
whether to keep existing output files when actor is called for the first time, in order to allow appending to files from multiple locations in flow.
-
m_BufferSize
protected int m_BufferSize
the size of the buffer.
-
m_Buffer
protected List<weka.core.Instance> m_Buffer
the buffer.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceGlobalInfoSupporter
- Specified by:
globalInfo
in classAbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceOptionHandler
- Overrides:
defineOptions
in classAbstractActor
-
initialize
protected void initialize()
Initializes the members.- Overrides:
initialize
in classAbstractActor
-
getQuickInfo
public String getQuickInfo()
Returns a quick info about the actor, which will be displayed in the GUI.- Specified by:
getQuickInfo
in interfaceActor
- Specified by:
getQuickInfo
in interfaceQuickInfoSupporter
- Overrides:
getQuickInfo
in classAbstractActor
- Returns:
- null if no info available, otherwise short string
-
setCheckHeader
public void setCheckHeader(boolean value)
Sets whether to check the header or not.- Parameters:
value
- if true then the headers get checked
-
getCheckHeader
public boolean getCheckHeader()
Returns whether the header gets checked or not.- Returns:
- true if the header gets checked
-
checkHeaderTipText
public String checkHeaderTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setOutputPrefix
public void setOutputPrefix(PlaceholderFile value)
Sets the prefix for the output (path + partial filename). Automatically removes .arff or .csv extensions from the partial file name since they get added automatically.- Parameters:
value
- the prefix
-
getOutputPrefix
public PlaceholderFile getOutputPrefix()
Returns the current output prefix.- Returns:
- the prefix
-
outputPrefixTipText
public String outputPrefixTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setOutputFormat
public void setOutputFormat(WekaInstanceDumper.OutputFormat value)
Sets the output format.- Parameters:
value
- the format
-
getOutputFormat
public WekaInstanceDumper.OutputFormat getOutputFormat()
Returns the current output format.- Returns:
- the format
-
outputFormatTipText
public String outputFormatTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setUseRelationNameAsFilename
public void setUseRelationNameAsFilename(boolean value)
Sets whether to use the relation name as filename instead.- Parameters:
value
- if true then the relation name will be used
-
getUseRelationNameAsFilename
public boolean getUseRelationNameAsFilename()
Returns whether the relation name is used as filename.- Returns:
- true if the relation name is used
-
useRelationNameAsFilenameTipText
public String useRelationNameAsFilenameTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setKeepExisting
public void setKeepExisting(boolean value)
Sets whether to keep any existing file on first execution.- Parameters:
value
- if true then existing file is kept
-
getKeepExisting
public boolean getKeepExisting()
Returns whether any existing file is kept on first execution.- Returns:
- true if existing file is kept
-
keepExistingTipText
public String keepExistingTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setBufferSize
public void setBufferSize(int value)
Sets the number of instances to buffer before writing them to disk.- Specified by:
setBufferSize
in interfaceBufferSupporter
- Parameters:
value
- the number of instances to buffer
-
getBufferSize
public int getBufferSize()
Returns the number of instances to buffer before writing them to disk.- Specified by:
getBufferSize
in interfaceBufferSupporter
- Returns:
- the number of intances to buffer
-
bufferSizeTipText
public String bufferSizeTipText()
Returns the tip text for this property.- Specified by:
bufferSizeTipText
in interfaceBufferSupporter
- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
pruneBackup
protected void pruneBackup()
Removes entries from the backup.- Overrides:
pruneBackup
in classAbstractActor
- See Also:
reset()
-
backupState
protected Hashtable<String,Object> backupState()
Backs up the current state of the actor before update the variables.- Overrides:
backupState
in classAbstractTransformer
- Returns:
- the backup
- See Also:
AbstractActor.updateVariables()
,AbstractActor.restoreState(Hashtable)
-
restoreState
protected void restoreState(Hashtable<String,Object> state)
Restores the state of the actor before the variables got updated.- Overrides:
restoreState
in classAbstractTransformer
- Parameters:
state
- the backup of the state to restore from- See Also:
AbstractActor.updateVariables()
,AbstractActor.backupState()
-
reset
protected void reset()
Resets the scheme.- Overrides:
reset
in classAbstractActor
-
accepts
public Class[] accepts()
Returns the class that the consumer accepts.- Specified by:
accepts
in interfaceInputConsumer
- Returns:
- weka.core.Instance.class, double[].class
-
generates
public Class[] generates()
Returns the class of objects that it generates.- Specified by:
generates
in interfaceOutputProducer
- Returns:
- java.lang.String.class
-
setUp
public String setUp()
Initializes the item for flow execution. Also calls the reset() method first before anything else.- Specified by:
setUp
in interfaceActor
- Overrides:
setUp
in classAbstractActor
- Returns:
- null if everything is fine, otherwise error message
- See Also:
AbstractActor.reset()
-
createFilename
protected File createFilename(weka.core.Instances header)
Generates the filename for the output.- Parameters:
header
- the current relation- Returns:
- the generated filename
-
createHeader
protected String createHeader(weka.core.Instances header)
Turns the dataset header into the appropriate format.- Parameters:
header
- the header to convert- Returns:
- the generated output
-
createRow
protected String createRow(weka.core.Instance row)
Turns the row into the appropriate format.- Parameters:
row
- the row to convert- Returns:
- the generated output
-
writeToDisk
protected String writeToDisk(boolean append)
Writes the content of the buffer to disk.- Parameters:
append
- whether to append- Returns:
- error message is something went wrong, null otherwise
-
updateVariables
protected String updateVariables()
Gets called when the actor needs to be re-setUp when a variable changes.- Overrides:
updateVariables
in classAbstractActor
- Returns:
- null if everything is fine, otherwise error message
-
doExecute
protected String doExecute()
Executes the flow item.- Specified by:
doExecute
in classAbstractActor
- Returns:
- null if everything is fine, otherwise error message
-
wrapUp
public void wrapUp()
Cleans up after the execution has finished.- Specified by:
wrapUp
in interfaceActor
- Overrides:
wrapUp
in classAbstractTransformer
-
performFlush
public void performFlush()
Performs the flush.- Specified by:
performFlush
in interfaceFlushSupporter
-
-