Class WekaInstanceDumper

  • All Implemented Interfaces:
    AdditionalInformationHandler, BufferSupporter, CleanUpHandler, Destroyable, GlobalInfoSupporter, LoggingLevelHandler, LoggingSupporter, OptionHandler, QuickInfoSupporter, ShallowCopySupporter<Actor>, SizeOfHandler, Stoppable, StoppableWithFeedback, VariablesInspectionHandler, VariableChangeListener, Actor, ErrorHandler, FlushSupporter, InputConsumer, OutputProducer, Serializable, Comparable

    public class WekaInstanceDumper
    extends AbstractTransformer
    implements BufferSupporter, FlushSupporter
    Dumps weka.core.Instance objects into an ARFF file. If the headers change and the header-check is enabled, then a new file will be used.
    The actor can also turn double arrays into weka.core.Instance objects (all attributes are assumed to be numeric).

    Input/output:
    - accepts:
       weka.core.Instance
       double[]
    - generates:
       java.lang.String


    Valid options are:

    -D <int> (property: debugLevel)
        The greater the number the more additional info the scheme may output to
        the console (0 = off).
        default: 0
        minimum: 0
     
    -name <java.lang.String> (property: name)
        The name of the actor.
        default: WekaInstanceDumper
     
    -annotation <adams.core.base.BaseText> (property: annotations)
        The annotations to attach to this actor.
        default:
     
    -skip (property: skip)
        If set to true, transformation is skipped and the input token is just forwarded
        as it is.
     
    -stop-flow-on-error (property: stopFlowOnError)
        If set to true, the flow gets stopped in case this actor encounters an error;
         useful for critical actors.
     
    -check (property: checkHeader)
        Whether to check the headers - if the headers change, the Instance object
        gets dumped into a new file.
     
    -prefix <adams.core.io.PlaceholderFile> (property: outputPrefix)
        The path and partial filename of the output file; automatically removes '
        arff' and 'csv' extensions, as they get added automatically.
        default: ${CWD}
     
    -format <ARFF|CSV|TAB> (property: outputFormat)
        The format to output the data in.
        default: ARFF
     
    -use-relation (property: useRelationNameAsFilename)
        If set to true, then the relation name replaces the name of the output file;
         eg if the output file is '/some/where/file.arff' and the relation is 'anneal'
         then the resulting file name will be '/some/where/anneal.arff'.
     
    -keep-existing (property: keepExisting)
        If enabled, any output file that exists when the actor is executed for the
        first time (or variables modify the actor) won't get replaced with the current
        header; useful when outputting data in multiple locations in the flow, but
        one needs to be cautious as to not stored mixed content (eg varying number
        of attributes, etc).
     
    -buffer-size <int> (property: bufferSize)
        The number of instances to buffer before writing to disk, in order to improve
        I/O performance.
        default: 1
        minimum: 1
     
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • BACKUP_HEADER

        public static final String BACKUP_HEADER
        the key for storing the header in the backup.
        See Also:
        Constant Field Values
      • BACKUP_COUNTER

        public static final String BACKUP_COUNTER
        the key for storing the counter in the backup.
        See Also:
        Constant Field Values
      • BACKUP_BUFFER

        public static final String BACKUP_BUFFER
        the key for storing the buffer in the backup.
        See Also:
        Constant Field Values
      • m_Header

        protected weka.core.Instances m_Header
        the header of the dataset.
      • m_Counter

        protected int m_Counter
        the counter for the filenames.
      • m_CheckHeader

        protected boolean m_CheckHeader
        whether to check the header.
      • m_OutputPrefix

        protected PlaceholderFile m_OutputPrefix
        the output prefix.
      • m_UseRelationNameAsFilename

        protected boolean m_UseRelationNameAsFilename
        whether to use the relation name as filename.
      • m_KeepExisting

        protected boolean m_KeepExisting
        whether to keep existing output files when actor is called for the first time, in order to allow appending to files from multiple locations in flow.
      • m_BufferSize

        protected int m_BufferSize
        the size of the buffer.
      • m_Buffer

        protected List<weka.core.Instance> m_Buffer
        the buffer.
      • m_Writing

        protected boolean m_Writing
        whether currently writing to disk.
    • Constructor Detail

      • WekaInstanceDumper

        public WekaInstanceDumper()
    • Method Detail

      • setCheckHeader

        public void setCheckHeader​(boolean value)
        Sets whether to check the header or not.
        Parameters:
        value - if true then the headers get checked
      • getCheckHeader

        public boolean getCheckHeader()
        Returns whether the header gets checked or not.
        Returns:
        true if the header gets checked
      • checkHeaderTipText

        public String checkHeaderTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setOutputPrefix

        public void setOutputPrefix​(PlaceholderFile value)
        Sets the prefix for the output (path + partial filename). Automatically removes .arff or .csv extensions from the partial file name since they get added automatically.
        Parameters:
        value - the prefix
      • getOutputPrefix

        public PlaceholderFile getOutputPrefix()
        Returns the current output prefix.
        Returns:
        the prefix
      • outputPrefixTipText

        public String outputPrefixTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • outputFormatTipText

        public String outputFormatTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setUseRelationNameAsFilename

        public void setUseRelationNameAsFilename​(boolean value)
        Sets whether to use the relation name as filename instead.
        Parameters:
        value - if true then the relation name will be used
      • getUseRelationNameAsFilename

        public boolean getUseRelationNameAsFilename()
        Returns whether the relation name is used as filename.
        Returns:
        true if the relation name is used
      • useRelationNameAsFilenameTipText

        public String useRelationNameAsFilenameTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setKeepExisting

        public void setKeepExisting​(boolean value)
        Sets whether to keep any existing file on first execution.
        Parameters:
        value - if true then existing file is kept
      • getKeepExisting

        public boolean getKeepExisting()
        Returns whether any existing file is kept on first execution.
        Returns:
        true if existing file is kept
      • keepExistingTipText

        public String keepExistingTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setBufferSize

        public void setBufferSize​(int value)
        Sets the number of instances to buffer before writing them to disk.
        Specified by:
        setBufferSize in interface BufferSupporter
        Parameters:
        value - the number of instances to buffer
      • getBufferSize

        public int getBufferSize()
        Returns the number of instances to buffer before writing them to disk.
        Specified by:
        getBufferSize in interface BufferSupporter
        Returns:
        the number of intances to buffer
      • bufferSizeTipText

        public String bufferSizeTipText()
        Returns the tip text for this property.
        Specified by:
        bufferSizeTipText in interface BufferSupporter
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • reset

        protected void reset()
        Resets the scheme.
        Overrides:
        reset in class AbstractActor
      • accepts

        public Class[] accepts()
        Returns the class that the consumer accepts.
        Specified by:
        accepts in interface InputConsumer
        Returns:
        weka.core.Instance.class, double[].class
      • generates

        public Class[] generates()
        Returns the class of objects that it generates.
        Specified by:
        generates in interface OutputProducer
        Returns:
        java.lang.String.class
      • setUp

        public String setUp()
        Initializes the item for flow execution. Also calls the reset() method first before anything else.
        Specified by:
        setUp in interface Actor
        Overrides:
        setUp in class AbstractActor
        Returns:
        null if everything is fine, otherwise error message
        See Also:
        AbstractActor.reset()
      • createFilename

        protected File createFilename​(weka.core.Instances header)
        Generates the filename for the output.
        Parameters:
        header - the current relation
        Returns:
        the generated filename
      • createHeader

        protected String createHeader​(weka.core.Instances header)
        Turns the dataset header into the appropriate format.
        Parameters:
        header - the header to convert
        Returns:
        the generated output
      • createRow

        protected String createRow​(weka.core.Instance row)
        Turns the row into the appropriate format.
        Parameters:
        row - the row to convert
        Returns:
        the generated output
      • writeToDisk

        protected String writeToDisk​(boolean append)
        Writes the content of the buffer to disk.
        Parameters:
        append - whether to append
        Returns:
        error message is something went wrong, null otherwise
      • updateVariables

        protected String updateVariables()
        Gets called when the actor needs to be re-setUp when a variable changes.
        Overrides:
        updateVariables in class AbstractActor
        Returns:
        null if everything is fine, otherwise error message
      • doExecute

        protected String doExecute()
        Executes the flow item.
        Specified by:
        doExecute in class AbstractActor
        Returns:
        null if everything is fine, otherwise error message