adams.flow.transformer
Class WekaInstanceDumper

java.lang.Object
  extended by adams.core.ConsoleObject
      extended by adams.core.option.AbstractOptionHandler
          extended by adams.flow.core.AbstractActor
              extended by adams.flow.transformer.AbstractTransformer
                  extended by adams.flow.transformer.WekaInstanceDumper
All Implemented Interfaces:
AdditionalInformationHandler, CleanUpHandler, Debuggable, DebugOutputHandler, Destroyable, OptionHandler, QuickInfoSupporter, ShallowCopySupporter<AbstractActor>, SizeOfHandler, Stoppable, VariableChangeListener, ErrorHandler, InputConsumer, OutputProducer, Serializable, Comparable

public class WekaInstanceDumper
extends AbstractTransformer

Dumps weka.core.Instance objects into an ARFF file. If the headers change and the header-check is enabled, then a new file will be used.
The actor can also turn double arrays into weka.core.Instance objects (all attributes are assumed to be numeric).

Input/output:
- accepts:
   weka.core.Instance
   double[]
- generates:
   java.lang.String

Valid options are:

-D <int> (property: debugLevel)
    The greater the number the more additional info the scheme may output to 
    the console (0 = off).
    default: 0
    minimum: 0
 
-name <java.lang.String> (property: name)
    The name of the actor.
    default: WekaInstanceDumper
 
-annotation <adams.core.base.BaseText> (property: annotations)
    The annotations to attach to this actor.
    default: 
 
-skip (property: skip)
    If set to true, transformation is skipped and the input token is just forwarded 
    as it is.
 
-stop-flow-on-error (property: stopFlowOnError)
    If set to true, the flow gets stopped in case this actor encounters an error;
     useful for critical actors.
 
-check (property: checkHeader)
    Whether to check the headers - if the headers change, the Instance object 
    gets dumped into a new file.
 
-prefix <adams.core.io.PlaceholderFile> (property: outputPrefix)
    The path and partial filename of the output file; automatically removes '
    arff' and 'csv' extensions, as they get added automatically.
    default: ${CWD}
 
-format <ARFF|CSV|TAB> (property: outputFormat)
    The format to output the data in.
    default: ARFF
 
-use-relation (property: useRelationNameAsFilename)
    If set to true, then the relation name replaces the name of the output file;
     eg if the output file is '/some/where/file.arff' and the relation is 'anneal'
     then the resulting file name will be '/some/where/anneal.arff'.
 
-keep-existing (property: keepExisting)
    If enabled, any output file that exists when the actor is executed for the 
    first time (or variables modify the actor) won't get replaced with the current 
    header; useful when outputting data in multiple locations in the flow, but 
    one needs to be cautious as to not stored mixed content (eg varying number 
    of attributes, etc).
 
-buffer-size <int> (property: bufferSize)
    The number of instances to buffer before writing to disk, in order to improve 
    I/O performance.
    default: 1
    minimum: 1
 

Version:
$Revision: 5024 $
Author:
fracpete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Nested Class Summary
static class WekaInstanceDumper.OutputFormat
          The format to output the data in.
 
Field Summary
static String BACKUP_BUFFER
          the key for storing the buffer in the backup.
static String BACKUP_COUNTER
          the key for storing the counter in the backup.
static String BACKUP_HEADER
          the key for storing the header in the backup.
protected  Vector<weka.core.Instance> m_Buffer
          the buffer.
protected  int m_BufferSize
          the size of the buffer.
protected  boolean m_CheckHeader
          whether to check the header.
protected  int m_Counter
          the counter for the filenames.
protected  weka.core.Instances m_Header
          the header of the dataset.
protected  boolean m_KeepExisting
          whether to keep existing output files when actor is called for the first time, in order to allow appending to files from multiple locations in flow.
protected  WekaInstanceDumper.OutputFormat m_OutputFormat
          the output format.
protected  PlaceholderFile m_OutputPrefix
          the output prefix.
protected  boolean m_UseRelationNameAsFilename
          whether to use the relation name as filename.
 
Fields inherited from class adams.flow.transformer.AbstractTransformer
BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
 
Fields inherited from class adams.flow.core.AbstractActor
FILE_EXTENSION, FILE_EXTENSION_GZ, m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_FullName, m_Headless, m_Name, m_Parent, m_Root, m_Self, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
 
Fields inherited from class adams.core.option.AbstractOptionHandler
m_DebugLevel, m_OptionManager
 
Constructor Summary
WekaInstanceDumper()
           
 
Method Summary
 Class[] accepts()
          Returns the class that the consumer accepts.
protected  Hashtable<String,Object> backupState()
          Backs up the current state of the actor before update the variables.
 String bufferSizeTipText()
          Returns the tip text for this property.
 String checkHeaderTipText()
          Returns the tip text for this property.
protected  File createFilename(weka.core.Instances header)
          Generates the filename for the output.
protected  String createHeader(weka.core.Instances header)
          Turns the dataset header into the appropriate format.
protected  String createRow(weka.core.Instance row)
          Turns the row into the appropriate format.
 void defineOptions()
          Adds options to the internal list of options.
protected  String doExecute()
          Executes the flow item.
 Class[] generates()
          Returns the class of objects that it generates.
 int getBufferSize()
          Returns the number of instances to buffer before writing them to disk.
 boolean getCheckHeader()
          Returns whether the header gets checked or not.
 boolean getKeepExisting()
          Returns whether the relation name is used as filename.
 WekaInstanceDumper.OutputFormat getOutputFormat()
          Returns the current output format.
 PlaceholderFile getOutputPrefix()
          Returns the current output prefix.
 String getQuickInfo()
          Returns a quick info about the actor, which will be displayed in the GUI.
 boolean getUseRelationNameAsFilename()
          Returns whether the relation name is used as filename.
 String globalInfo()
          Returns a string describing the object.
protected  void initialize()
          Initializes the members.
 String keepExistingTipText()
          Returns the tip text for this property.
 String outputFormatTipText()
          Returns the tip text for this property.
 String outputPrefixTipText()
          Returns the tip text for this property.
protected  void pruneBackup()
          Removes entries from the backup.
protected  void reset()
          Resets the scheme.
protected  void restoreState(Hashtable<String,Object> state)
          Restores the state of the actor before the variables got updated.
 void setBufferSize(int value)
          Sets the number of instances to buffer before writing them to disk.
 void setCheckHeader(boolean value)
          Sets whether to check the header or not.
 void setKeepExisting(boolean value)
          Sets whether to use the relation name as filename instead.
 void setOutputFormat(WekaInstanceDumper.OutputFormat value)
          Sets the output format.
 void setOutputPrefix(PlaceholderFile value)
          Sets the prefix for the output (path + partial filename).
 String setUp()
          Initializes the item for flow execution.
 void setUseRelationNameAsFilename(boolean value)
          Sets whether to use the relation name as filename instead.
 String useRelationNameAsFilenameTipText()
          Returns the tip text for this property.
 void wrapUp()
          Cleans up after the execution has finished.
protected  String writeToDisk(boolean append)
          Writes the content of the buffer to disk.
 
Methods inherited from class adams.flow.transformer.AbstractTransformer
execute, hasPendingOutput, input, output, postExecute
 
Methods inherited from class adams.flow.core.AbstractActor
annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, debug, destroy, equals, findVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFullName, getName, getNextSibling, getParent, getPreviousSibling, getRoot, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isBackedUp, isExecuted, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, preExecute, pruneBackup, setAnnotations, setErrorHandler, setHeadless, setName, setParent, setSkip, setStopFlowOnError, setVariables, shallowCopy, shallowCopy, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, updateVariables, variableChanged
 
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, debug, debugLevelTipText, finishInit, getDebugLevel, getOptionManager, isDebugOn, newOptionManager, setDebugLevel, toCommandLine, toString
 
Methods inherited from class adams.core.ConsoleObject
getDebugging, getSystemErr, getSystemOut
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

BACKUP_HEADER

public static final String BACKUP_HEADER
the key for storing the header in the backup.

See Also:
Constant Field Values

BACKUP_COUNTER

public static final String BACKUP_COUNTER
the key for storing the counter in the backup.

See Also:
Constant Field Values

BACKUP_BUFFER

public static final String BACKUP_BUFFER
the key for storing the buffer in the backup.

See Also:
Constant Field Values

m_Header

protected weka.core.Instances m_Header
the header of the dataset.


m_Counter

protected int m_Counter
the counter for the filenames.


m_CheckHeader

protected boolean m_CheckHeader
whether to check the header.


m_OutputPrefix

protected PlaceholderFile m_OutputPrefix
the output prefix.


m_OutputFormat

protected WekaInstanceDumper.OutputFormat m_OutputFormat
the output format.


m_UseRelationNameAsFilename

protected boolean m_UseRelationNameAsFilename
whether to use the relation name as filename.


m_KeepExisting

protected boolean m_KeepExisting
whether to keep existing output files when actor is called for the first time, in order to allow appending to files from multiple locations in flow.


m_BufferSize

protected int m_BufferSize
the size of the buffer.


m_Buffer

protected Vector<weka.core.Instance> m_Buffer
the buffer.

Constructor Detail

WekaInstanceDumper

public WekaInstanceDumper()
Method Detail

globalInfo

public String globalInfo()
Returns a string describing the object.

Specified by:
globalInfo in class AbstractOptionHandler
Returns:
a description suitable for displaying in the gui

defineOptions

public void defineOptions()
Adds options to the internal list of options.

Specified by:
defineOptions in interface OptionHandler
Overrides:
defineOptions in class AbstractActor

initialize

protected void initialize()
Initializes the members.

Overrides:
initialize in class AbstractActor

getQuickInfo

public String getQuickInfo()
Returns a quick info about the actor, which will be displayed in the GUI.

Specified by:
getQuickInfo in interface QuickInfoSupporter
Overrides:
getQuickInfo in class AbstractActor
Returns:
null if no info available, otherwise short string

setCheckHeader

public void setCheckHeader(boolean value)
Sets whether to check the header or not.

Parameters:
value - if true then the headers get checked

getCheckHeader

public boolean getCheckHeader()
Returns whether the header gets checked or not.

Returns:
true if the header gets checked

checkHeaderTipText

public String checkHeaderTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setOutputPrefix

public void setOutputPrefix(PlaceholderFile value)
Sets the prefix for the output (path + partial filename). Automatically removes .arff or .csv extensions from the partial file name since they get added automatically.

Parameters:
value - the prefix

getOutputPrefix

public PlaceholderFile getOutputPrefix()
Returns the current output prefix.

Returns:
the prefix

outputPrefixTipText

public String outputPrefixTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setOutputFormat

public void setOutputFormat(WekaInstanceDumper.OutputFormat value)
Sets the output format.

Parameters:
value - the format

getOutputFormat

public WekaInstanceDumper.OutputFormat getOutputFormat()
Returns the current output format.

Returns:
the format

outputFormatTipText

public String outputFormatTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setUseRelationNameAsFilename

public void setUseRelationNameAsFilename(boolean value)
Sets whether to use the relation name as filename instead.

Parameters:
value - if true then the relation name will be used

getUseRelationNameAsFilename

public boolean getUseRelationNameAsFilename()
Returns whether the relation name is used as filename.

Returns:
true if the relation name is used

useRelationNameAsFilenameTipText

public String useRelationNameAsFilenameTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setKeepExisting

public void setKeepExisting(boolean value)
Sets whether to use the relation name as filename instead.

Parameters:
value - if true then the relation name will be used

getKeepExisting

public boolean getKeepExisting()
Returns whether the relation name is used as filename.

Returns:
true if the relation name is used

keepExistingTipText

public String keepExistingTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

setBufferSize

public void setBufferSize(int value)
Sets the number of instances to buffer before writing them to disk.

Parameters:
value - the number of instances to buffer

getBufferSize

public int getBufferSize()
Returns the number of instances to buffer before writing them to disk.

Returns:
the number of intances to buffer

bufferSizeTipText

public String bufferSizeTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the GUI or for listing the options.

pruneBackup

protected void pruneBackup()
Removes entries from the backup.

Overrides:
pruneBackup in class AbstractActor
See Also:
reset()

backupState

protected Hashtable<String,Object> backupState()
Backs up the current state of the actor before update the variables.

Overrides:
backupState in class AbstractTransformer
Returns:
the backup

restoreState

protected void restoreState(Hashtable<String,Object> state)
Restores the state of the actor before the variables got updated.

Overrides:
restoreState in class AbstractTransformer
Parameters:
state - the backup of the state to restore from

reset

protected void reset()
Resets the scheme.

Overrides:
reset in class AbstractTransformer

accepts

public Class[] accepts()
Returns the class that the consumer accepts.

Returns:
weka.core.Instance.class, double[].class

generates

public Class[] generates()
Returns the class of objects that it generates.

Returns:
java.lang.String.class

setUp

public String setUp()
Initializes the item for flow execution. Also calls the reset() method first before anything else.

Overrides:
setUp in class AbstractActor
Returns:
null if everything is fine, otherwise error message

createFilename

protected File createFilename(weka.core.Instances header)
Generates the filename for the output.

Parameters:
header - the current relation
Returns:
the generated filename

createHeader

protected String createHeader(weka.core.Instances header)
Turns the dataset header into the appropriate format.

Parameters:
header - the header to convert
Returns:
the generated output

createRow

protected String createRow(weka.core.Instance row)
Turns the row into the appropriate format.

Parameters:
row - the row to convert
Returns:
the generated output

writeToDisk

protected String writeToDisk(boolean append)
Writes the content of the buffer to disk.

Parameters:
append - whether to append
Returns:
error message is something went wrong, null otherwise

doExecute

protected String doExecute()
Executes the flow item.

Specified by:
doExecute in class AbstractActor
Returns:
null if everything is fine, otherwise error message

wrapUp

public void wrapUp()
Cleans up after the execution has finished.

Overrides:
wrapUp in class AbstractTransformer


Copyright © 2012 University of Waikato, Hamilton, NZ. All Rights Reserved.