Class WekaInstanceDumper

  • All Implemented Interfaces:
    adams.core.AdditionalInformationHandler, adams.core.BufferSupporter, adams.core.CleanUpHandler, adams.core.Destroyable, adams.core.GlobalInfoSupporter, adams.core.logging.LoggingLevelHandler, adams.core.logging.LoggingSupporter, adams.core.option.OptionHandler, adams.core.QuickInfoSupporter, adams.core.ShallowCopySupporter<adams.flow.core.Actor>, adams.core.SizeOfHandler, adams.core.Stoppable, adams.core.StoppableWithFeedback, adams.core.VariablesInspectionHandler, adams.event.VariableChangeListener, adams.flow.core.Actor, adams.flow.core.ErrorHandler, adams.flow.core.FlushSupporter, adams.flow.core.InputConsumer, adams.flow.core.OutputProducer, Serializable, Comparable

    public class WekaInstanceDumper
    extends adams.flow.transformer.AbstractTransformer
    implements adams.core.BufferSupporter, adams.flow.core.FlushSupporter
    Dumps weka.core.Instance objects into an ARFF file. If the headers change and the header-check is enabled, then a new file will be used.
    The actor can also turn double arrays into weka.core.Instance objects (all attributes are assumed to be numeric).

    Input/output:
    - accepts:
       weka.core.Instance
       double[]
    - generates:
       java.lang.String


    Valid options are:

    -D <int> (property: debugLevel)
        The greater the number the more additional info the scheme may output to
        the console (0 = off).
        default: 0
        minimum: 0
     
    -name <java.lang.String> (property: name)
        The name of the actor.
        default: WekaInstanceDumper
     
    -annotation <adams.core.base.BaseText> (property: annotations)
        The annotations to attach to this actor.
        default:
     
    -skip (property: skip)
        If set to true, transformation is skipped and the input token is just forwarded
        as it is.
     
    -stop-flow-on-error (property: stopFlowOnError)
        If set to true, the flow gets stopped in case this actor encounters an error;
         useful for critical actors.
     
    -check (property: checkHeader)
        Whether to check the headers - if the headers change, the Instance object
        gets dumped into a new file.
     
    -prefix <adams.core.io.PlaceholderFile> (property: outputPrefix)
        The path and partial filename of the output file; automatically removes '
        arff' and 'csv' extensions, as they get added automatically.
        default: ${CWD}
     
    -format <ARFF|CSV|TAB> (property: outputFormat)
        The format to output the data in.
        default: ARFF
     
    -use-relation (property: useRelationNameAsFilename)
        If set to true, then the relation name replaces the name of the output file;
         eg if the output file is '/some/where/file.arff' and the relation is 'anneal'
         then the resulting file name will be '/some/where/anneal.arff'.
     
    -keep-existing (property: keepExisting)
        If enabled, any output file that exists when the actor is executed for the
        first time (or variables modify the actor) won't get replaced with the current
        header; useful when outputting data in multiple locations in the flow, but
        one needs to be cautious as to not stored mixed content (eg varying number
        of attributes, etc).
     
    -buffer-size <int> (property: bufferSize)
        The number of instances to buffer before writing to disk, in order to improve
        I/O performance.
        default: 1
        minimum: 1
     
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static String BACKUP_BUFFER
      the key for storing the buffer in the backup.
      static String BACKUP_COUNTER
      the key for storing the counter in the backup.
      static String BACKUP_HEADER
      the key for storing the header in the backup.
      protected List<weka.core.Instance> m_Buffer
      the buffer.
      protected int m_BufferSize
      the size of the buffer.
      protected boolean m_CheckHeader
      whether to check the header.
      protected int m_Counter
      the counter for the filenames.
      protected weka.core.Instances m_Header
      the header of the dataset.
      protected boolean m_KeepExisting
      whether to keep existing output files when actor is called for the first time, in order to allow appending to files from multiple locations in flow.
      protected WekaInstanceDumper.OutputFormat m_OutputFormat
      the output format.
      protected adams.core.io.PlaceholderFile m_OutputPrefix
      the output prefix.
      protected boolean m_UseRelationNameAsFilename
      whether to use the relation name as filename.
      protected boolean m_Writing
      whether currently writing to disk.
      • Fields inherited from class adams.flow.transformer.AbstractTransformer

        BACKUP_INPUT, BACKUP_OUTPUT, m_InputToken, m_OutputToken
      • Fields inherited from class adams.flow.core.AbstractActor

        m_Annotations, m_BackupState, m_DetectedObjectVariables, m_DetectedVariables, m_ErrorHandler, m_Executed, m_Executing, m_ExecutionListeningSupporter, m_FullName, m_LoggingPrefix, m_Name, m_Parent, m_ScopeHandler, m_Self, m_Silent, m_Skip, m_StopFlowOnError, m_StopMessage, m_Stopped, m_StorageHandler, m_VariablesUpdated
      • Fields inherited from class adams.core.option.AbstractOptionHandler

        m_OptionManager
      • Fields inherited from class adams.core.logging.LoggingObject

        m_Logger, m_LoggingIsEnabled, m_LoggingLevel
      • Fields inherited from interface adams.flow.core.Actor

        FILE_EXTENSION, FILE_EXTENSION_GZ
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Class[] accepts()
      Returns the class that the consumer accepts.
      protected Hashtable<String,​Object> backupState()
      Backs up the current state of the actor before update the variables.
      String bufferSizeTipText()
      Returns the tip text for this property.
      String checkHeaderTipText()
      Returns the tip text for this property.
      protected File createFilename​(weka.core.Instances header)
      Generates the filename for the output.
      protected String createHeader​(weka.core.Instances header)
      Turns the dataset header into the appropriate format.
      protected String createRow​(weka.core.Instance row)
      Turns the row into the appropriate format.
      void defineOptions()
      Adds options to the internal list of options.
      protected String doExecute()
      Executes the flow item.
      Class[] generates()
      Returns the class of objects that it generates.
      int getBufferSize()
      Returns the number of instances to buffer before writing them to disk.
      boolean getCheckHeader()
      Returns whether the header gets checked or not.
      boolean getKeepExisting()
      Returns whether any existing file is kept on first execution.
      WekaInstanceDumper.OutputFormat getOutputFormat()
      Returns the current output format.
      adams.core.io.PlaceholderFile getOutputPrefix()
      Returns the current output prefix.
      String getQuickInfo()
      Returns a quick info about the actor, which will be displayed in the GUI.
      boolean getUseRelationNameAsFilename()
      Returns whether the relation name is used as filename.
      String globalInfo()
      Returns a string describing the object.
      protected void initialize()
      Initializes the members.
      String keepExistingTipText()
      Returns the tip text for this property.
      String outputFormatTipText()
      Returns the tip text for this property.
      String outputPrefixTipText()
      Returns the tip text for this property.
      void performFlush()
      Performs the flush.
      protected void pruneBackup()
      Removes entries from the backup.
      protected void reset()
      Resets the scheme.
      protected void restoreState​(Hashtable<String,​Object> state)
      Restores the state of the actor before the variables got updated.
      void setBufferSize​(int value)
      Sets the number of instances to buffer before writing them to disk.
      void setCheckHeader​(boolean value)
      Sets whether to check the header or not.
      void setKeepExisting​(boolean value)
      Sets whether to keep any existing file on first execution.
      void setOutputFormat​(WekaInstanceDumper.OutputFormat value)
      Sets the output format.
      void setOutputPrefix​(adams.core.io.PlaceholderFile value)
      Sets the prefix for the output (path + partial filename).
      String setUp()
      Initializes the item for flow execution.
      void setUseRelationNameAsFilename​(boolean value)
      Sets whether to use the relation name as filename instead.
      protected String updateVariables()
      Gets called when the actor needs to be re-setUp when a variable changes.
      String useRelationNameAsFilenameTipText()
      Returns the tip text for this property.
      void wrapUp()
      Cleans up after the execution has finished.
      protected String writeToDisk​(boolean append)
      Writes the content of the buffer to disk.
      • Methods inherited from class adams.flow.transformer.AbstractTransformer

        currentInput, execute, hasInput, hasPendingOutput, input, output, postExecute
      • Methods inherited from class adams.flow.core.AbstractActor

        annotationsTipText, canInspectOptions, canPerformSetUpCheck, cleanUp, compareTo, configureLogger, destroy, equals, finalUpdateVariables, findVariables, findVariables, forceVariables, forCommandLine, forName, forName, getAdditionalInformation, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowActors, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, handleException, hasErrorHandler, hasStopMessage, index, isBackedUp, isExecuted, isExecuting, isFinished, isHeadless, isStopped, nameTipText, performSetUpChecks, performVariableChecks, preExecute, pruneBackup, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setVariables, shallowCopy, shallowCopy, silentTipText, sizeOf, skipTipText, stopExecution, stopExecution, stopFlowOnErrorTipText, updateDetectedVariables, updatePrefix, variableChanged
      • Methods inherited from class adams.core.option.AbstractOptionHandler

        cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, setLoggingLevel, toCommandLine, toString
      • Methods inherited from class adams.core.logging.LoggingObject

        getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled
      • Methods inherited from interface adams.flow.core.Actor

        cleanUp, compareTo, destroy, equals, execute, findVariables, getAnnotations, getDefaultName, getDetectedVariables, getErrorHandler, getFlowExecutionListeningSupporter, getFullName, getName, getNextSibling, getParent, getParentComponent, getPreviousSibling, getRoot, getScopeHandler, getSilent, getSkip, getStopFlowOnError, getStopMessage, getStorageHandler, getVariables, handleError, hasErrorHandler, hasStopMessage, index, isExecuted, isFinished, isHeadless, isStopped, setAnnotations, setErrorHandler, setName, setParent, setSilent, setSkip, setStopFlowOnError, setVariables, shallowCopy, shallowCopy, sizeOf, stopExecution, stopExecution, toCommandLine, variableChanged
      • Methods inherited from interface adams.core.AdditionalInformationHandler

        getAdditionalInformation
      • Methods inherited from interface adams.core.logging.LoggingLevelHandler

        getLoggingLevel, setLoggingLevel
      • Methods inherited from interface adams.core.logging.LoggingSupporter

        getLogger, isLoggingEnabled
      • Methods inherited from interface adams.core.option.OptionHandler

        cleanUpOptions, getOptionManager
      • Methods inherited from interface adams.core.VariablesInspectionHandler

        canInspectOptions
    • Field Detail

      • BACKUP_HEADER

        public static final String BACKUP_HEADER
        the key for storing the header in the backup.
        See Also:
        Constant Field Values
      • BACKUP_COUNTER

        public static final String BACKUP_COUNTER
        the key for storing the counter in the backup.
        See Also:
        Constant Field Values
      • BACKUP_BUFFER

        public static final String BACKUP_BUFFER
        the key for storing the buffer in the backup.
        See Also:
        Constant Field Values
      • m_Header

        protected weka.core.Instances m_Header
        the header of the dataset.
      • m_Counter

        protected int m_Counter
        the counter for the filenames.
      • m_CheckHeader

        protected boolean m_CheckHeader
        whether to check the header.
      • m_OutputPrefix

        protected adams.core.io.PlaceholderFile m_OutputPrefix
        the output prefix.
      • m_UseRelationNameAsFilename

        protected boolean m_UseRelationNameAsFilename
        whether to use the relation name as filename.
      • m_KeepExisting

        protected boolean m_KeepExisting
        whether to keep existing output files when actor is called for the first time, in order to allow appending to files from multiple locations in flow.
      • m_BufferSize

        protected int m_BufferSize
        the size of the buffer.
      • m_Buffer

        protected List<weka.core.Instance> m_Buffer
        the buffer.
      • m_Writing

        protected boolean m_Writing
        whether currently writing to disk.
    • Constructor Detail

      • WekaInstanceDumper

        public WekaInstanceDumper()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing the object.
        Specified by:
        globalInfo in interface adams.core.GlobalInfoSupporter
        Specified by:
        globalInfo in class adams.core.option.AbstractOptionHandler
        Returns:
        a description suitable for displaying in the gui
      • defineOptions

        public void defineOptions()
        Adds options to the internal list of options.
        Specified by:
        defineOptions in interface adams.core.option.OptionHandler
        Overrides:
        defineOptions in class adams.flow.core.AbstractActor
      • initialize

        protected void initialize()
        Initializes the members.
        Overrides:
        initialize in class adams.flow.core.AbstractActor
      • getQuickInfo

        public String getQuickInfo()
        Returns a quick info about the actor, which will be displayed in the GUI.
        Specified by:
        getQuickInfo in interface adams.flow.core.Actor
        Specified by:
        getQuickInfo in interface adams.core.QuickInfoSupporter
        Overrides:
        getQuickInfo in class adams.flow.core.AbstractActor
        Returns:
        null if no info available, otherwise short string
      • setCheckHeader

        public void setCheckHeader​(boolean value)
        Sets whether to check the header or not.
        Parameters:
        value - if true then the headers get checked
      • getCheckHeader

        public boolean getCheckHeader()
        Returns whether the header gets checked or not.
        Returns:
        true if the header gets checked
      • checkHeaderTipText

        public String checkHeaderTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setOutputPrefix

        public void setOutputPrefix​(adams.core.io.PlaceholderFile value)
        Sets the prefix for the output (path + partial filename). Automatically removes .arff or .csv extensions from the partial file name since they get added automatically.
        Parameters:
        value - the prefix
      • getOutputPrefix

        public adams.core.io.PlaceholderFile getOutputPrefix()
        Returns the current output prefix.
        Returns:
        the prefix
      • outputPrefixTipText

        public String outputPrefixTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • outputFormatTipText

        public String outputFormatTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setUseRelationNameAsFilename

        public void setUseRelationNameAsFilename​(boolean value)
        Sets whether to use the relation name as filename instead.
        Parameters:
        value - if true then the relation name will be used
      • getUseRelationNameAsFilename

        public boolean getUseRelationNameAsFilename()
        Returns whether the relation name is used as filename.
        Returns:
        true if the relation name is used
      • useRelationNameAsFilenameTipText

        public String useRelationNameAsFilenameTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setKeepExisting

        public void setKeepExisting​(boolean value)
        Sets whether to keep any existing file on first execution.
        Parameters:
        value - if true then existing file is kept
      • getKeepExisting

        public boolean getKeepExisting()
        Returns whether any existing file is kept on first execution.
        Returns:
        true if existing file is kept
      • keepExistingTipText

        public String keepExistingTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setBufferSize

        public void setBufferSize​(int value)
        Sets the number of instances to buffer before writing them to disk.
        Specified by:
        setBufferSize in interface adams.core.BufferSupporter
        Parameters:
        value - the number of instances to buffer
      • getBufferSize

        public int getBufferSize()
        Returns the number of instances to buffer before writing them to disk.
        Specified by:
        getBufferSize in interface adams.core.BufferSupporter
        Returns:
        the number of intances to buffer
      • bufferSizeTipText

        public String bufferSizeTipText()
        Returns the tip text for this property.
        Specified by:
        bufferSizeTipText in interface adams.core.BufferSupporter
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • pruneBackup

        protected void pruneBackup()
        Removes entries from the backup.
        Overrides:
        pruneBackup in class adams.flow.core.AbstractActor
        See Also:
        reset()
      • backupState

        protected Hashtable<String,​Object> backupState()
        Backs up the current state of the actor before update the variables.
        Overrides:
        backupState in class adams.flow.transformer.AbstractTransformer
        Returns:
        the backup
      • restoreState

        protected void restoreState​(Hashtable<String,​Object> state)
        Restores the state of the actor before the variables got updated.
        Overrides:
        restoreState in class adams.flow.transformer.AbstractTransformer
        Parameters:
        state - the backup of the state to restore from
      • reset

        protected void reset()
        Resets the scheme.
        Overrides:
        reset in class adams.flow.core.AbstractActor
      • accepts

        public Class[] accepts()
        Returns the class that the consumer accepts.
        Specified by:
        accepts in interface adams.flow.core.InputConsumer
        Returns:
        weka.core.Instance.class, double[].class
      • generates

        public Class[] generates()
        Returns the class of objects that it generates.
        Specified by:
        generates in interface adams.flow.core.OutputProducer
        Returns:
        java.lang.String.class
      • setUp

        public String setUp()
        Initializes the item for flow execution. Also calls the reset() method first before anything else.
        Specified by:
        setUp in interface adams.flow.core.Actor
        Overrides:
        setUp in class adams.flow.core.AbstractActor
        Returns:
        null if everything is fine, otherwise error message
      • createFilename

        protected File createFilename​(weka.core.Instances header)
        Generates the filename for the output.
        Parameters:
        header - the current relation
        Returns:
        the generated filename
      • createHeader

        protected String createHeader​(weka.core.Instances header)
        Turns the dataset header into the appropriate format.
        Parameters:
        header - the header to convert
        Returns:
        the generated output
      • createRow

        protected String createRow​(weka.core.Instance row)
        Turns the row into the appropriate format.
        Parameters:
        row - the row to convert
        Returns:
        the generated output
      • writeToDisk

        protected String writeToDisk​(boolean append)
        Writes the content of the buffer to disk.
        Parameters:
        append - whether to append
        Returns:
        error message is something went wrong, null otherwise
      • updateVariables

        protected String updateVariables()
        Gets called when the actor needs to be re-setUp when a variable changes.
        Overrides:
        updateVariables in class adams.flow.core.AbstractActor
        Returns:
        null if everything is fine, otherwise error message
      • doExecute

        protected String doExecute()
        Executes the flow item.
        Specified by:
        doExecute in class adams.flow.core.AbstractActor
        Returns:
        null if everything is fine, otherwise error message
      • wrapUp

        public void wrapUp()
        Cleans up after the execution has finished.
        Specified by:
        wrapUp in interface adams.flow.core.Actor
        Overrides:
        wrapUp in class adams.flow.transformer.AbstractTransformer
      • performFlush

        public void performFlush()
        Performs the flush.
        Specified by:
        performFlush in interface adams.flow.core.FlushSupporter