Class AttributeSummaryTransferFilter

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.RevisionHandler, weka.filters.UnsupervisedFilter

    public class AttributeSummaryTransferFilter
    extends weka.filters.SimpleBatchFilter
    implements weka.filters.UnsupervisedFilter
    Filter which trains another filter to summarise a sub-set of the data's attributes. The trained filter should be a supervised or unsupervised attribute filter. Trains the summary filter on a large set of unannotated data so it can be applied to a relatively small set which is annotated with other information.

    Valid options are:

     -row-finder <value>
      Row finder which selects rows for training the attribute-summarising filter.
      (default: adams.data.weka.rowfinder.NullFinder)
     -column-finder <value>
      Column finder which selects attributes to summarise.
      (default: adams.data.weka.columnfinder.NullFinder)
     -summary-filter <value>
      The filter to use to summarise the attributes.
      (default: weka.filters.unsupervised.attribute.PrincipalComponentsJ -R 0.95 -A 5 -M -1)
     -preserve-id-column <value>
      Whether the first column of the test data should be treated as a sample ID and kept in the first position of the output.
      (default: off)
     -class-name <value>
      The name of the attribute to treat as the class for supervised filters.
      (default: )
     -keep-supervised-class <value>
      Whether the class value for supervised filters should be kept in the resultant dataset or discarded.
      (default: off)
     -output-debug-info
      If set, filter is run in debug mode and
      may output additional info to the console
     -do-not-check-capabilities
      If set, filter capabilities are not checked before filter is built
      (use with caution).
    Author:
    Corey Sterling (csterlin at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected adams.core.base.BaseString m_ClassName
      The class-attribute for supervised attribute filters.
      protected ColumnFinder m_ColumnFinder
      The column-finder which selects the attributes to summarise.
      protected ColumnSplitter m_ColumnSplitter
      Column-splitter for separating attributes to be summarised.
      protected ColumnSplitter m_IDSplitter
      Column-splitter for separating the ID column.
      protected boolean m_KeepSupervisedClass
      Whether to keep the supervised filter class or discard it.
      protected Simple m_Merger
      Merger for reconstructing partial datasets.
      protected boolean m_PreserveIDColumn
      Whether to treat the first attribute as an ID.
      protected RowFinder m_RowFinder
      The row-finder which separates training data from actual data.
      protected RowSplitter m_RowSplitter
      Row-splitter for splitting training and actual data.
      protected weka.filters.Filter m_SummaryFilter
      The filter which performs attribute summarising.
      protected ColumnSplitter m_SupervisedClassSplitter
      Column-splitter for removing the supervised filter class.
      • Fields inherited from class weka.filters.Filter

        m_Debug, m_DoNotCheckCapabilities, m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean allowAccessToFullInputFormat()
      Returns whether to allow the determineOutputFormat(Instances) method access to the full dataset rather than just the header.
      String classNameTipText()
      Gets the tip-text for the class-name option.
      String columnFinderTipText()
      Gets the tip-text for the column-finder option.
      protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
      Determines the output format based on the input format and returns this.
      protected weka.core.Instances formatOutput​(weka.core.Instances filterOutput, weka.core.Instances theRest)
      Handles merging of output datasets and formatting.
      weka.core.Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      adams.core.base.BaseString getClassName()
      Gets the name of the attribute to use as the class attribute for supervised summary filters.
      ColumnFinder getColumnFinder()
      Gets the column finder which selects the attributes for summarisation.
      adams.core.base.BaseString getDefaultClassName()
      Gets the name of the default attribute to use as the class attribute for supervised summary filters.
      ColumnFinder getDefaultColumnFinder()
      Gets the default column finder which selects the attributes for summarisation.
      RowFinder getDefaultRowFinder()
      Gets the default training data row selector.
      weka.filters.Filter getDefaultSummaryFilter()
      Gets the default filter to use to summarise the attributes.
      boolean getKeepSupervisedClass()
      Gets whether to keep the class attribute of the summary attributes in the final dataset.
      String[] getOptions()
      returns the options of the current setup
      boolean getPreserveIDColumn()
      Gets whether the first non-summary attribute should be treated as an ID and moved to the first attribute position.
      RowFinder getRowFinder()
      Gets the training data row selector.
      weka.filters.Filter getSummaryFilter()
      Gets the filter to use to summarise the attributes.
      String globalInfo()
      Returns a string describing this filter.
      String keepSupervisedClassTipText()
      Gets the tip-text for the keep-supervised-class option.
      Enumeration<weka.core.Option> listOptions()
      Gets an enumeration describing the available options.
      String preserveIDColumnTipText()
      Gets the tip-text for the preserve-id-column option.
      protected weka.core.Instances process​(weka.core.Instances instances)
      Processes the given data (may change the provided dataset) and returns the modified version.
      String rowFinderTipText()
      Gets the tip-text for the row-finder option.
      void setClassName​(adams.core.base.BaseString value)
      Sets the name of the attribute to use as the class attribute for supervised summary filters.
      void setColumnFinder​(ColumnFinder value)
      Sets the column finder which selects the attributes for summarisation.
      void setKeepSupervisedClass​(boolean value)
      Sets whether to keep the class attribute of the summary attributes in the final dataset.
      void setOptions​(String[] options)
      Parses the options for this object.
      void setPreserveIDColumn​(boolean value)
      Sets whether the first non-summary attribute should be treated as an ID and moved to the first attribute position.
      void setRowFinder​(RowFinder value)
      Sets the training data row selector.
      void setSummaryFilter​(weka.filters.Filter value)
      Sets the filter to use to summarise the attributes.
      String summaryFilterTipText()
      Gets the tip-text for the pca-filter option.
      • Methods inherited from class weka.filters.SimpleBatchFilter

        batchFinished, hasImmediateOutputFormat, input
      • Methods inherited from class weka.filters.SimpleFilter

        reset, setInputFormat
      • Methods inherited from class weka.filters.Filter

        batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, getRevision, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, main, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
    • Field Detail

      • m_RowFinder

        protected RowFinder m_RowFinder
        The row-finder which separates training data from actual data.
      • m_ColumnFinder

        protected ColumnFinder m_ColumnFinder
        The column-finder which selects the attributes to summarise.
      • m_SummaryFilter

        protected weka.filters.Filter m_SummaryFilter
        The filter which performs attribute summarising.
      • m_PreserveIDColumn

        protected boolean m_PreserveIDColumn
        Whether to treat the first attribute as an ID.
      • m_ClassName

        protected adams.core.base.BaseString m_ClassName
        The class-attribute for supervised attribute filters.
      • m_KeepSupervisedClass

        protected boolean m_KeepSupervisedClass
        Whether to keep the supervised filter class or discard it.
      • m_Merger

        protected Simple m_Merger
        Merger for reconstructing partial datasets.
      • m_RowSplitter

        protected RowSplitter m_RowSplitter
        Row-splitter for splitting training and actual data.
      • m_ColumnSplitter

        protected ColumnSplitter m_ColumnSplitter
        Column-splitter for separating attributes to be summarised.
      • m_IDSplitter

        protected ColumnSplitter m_IDSplitter
        Column-splitter for separating the ID column.
      • m_SupervisedClassSplitter

        protected ColumnSplitter m_SupervisedClassSplitter
        Column-splitter for removing the supervised filter class.
    • Constructor Detail

      • AttributeSummaryTransferFilter

        public AttributeSummaryTransferFilter()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this filter.
        Specified by:
        globalInfo in class weka.filters.SimpleFilter
        Returns:
        a description of the filter suitable for displaying in the explorer/experimenter gui
      • listOptions

        public Enumeration<weka.core.Option> listOptions()
        Gets an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • getOptions

        public String[] getOptions()
        returns the options of the current setup
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        the current options
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses the options for this object.
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the options to use
        Throws:
        Exception - if the option setting fails
      • getDefaultRowFinder

        public RowFinder getDefaultRowFinder()
        Gets the default training data row selector.
        Returns:
        The default training data row selector.
      • setRowFinder

        public void setRowFinder​(RowFinder value)
        Sets the training data row selector.
        Parameters:
        value - The training data row selector.
      • getRowFinder

        public RowFinder getRowFinder()
        Gets the training data row selector.
        Returns:
        The training data row selector.
      • rowFinderTipText

        public String rowFinderTipText()
        Gets the tip-text for the row-finder option.
        Returns:
        The tip-text as a string.
      • getDefaultColumnFinder

        public ColumnFinder getDefaultColumnFinder()
        Gets the default column finder which selects the attributes for summarisation.
        Returns:
        The default column finder.
      • setColumnFinder

        public void setColumnFinder​(ColumnFinder value)
        Sets the column finder which selects the attributes for summarisation.
        Parameters:
        value - The column finder.
      • getColumnFinder

        public ColumnFinder getColumnFinder()
        Gets the column finder which selects the attributes for summarisation.
        Returns:
        The column finder.
      • columnFinderTipText

        public String columnFinderTipText()
        Gets the tip-text for the column-finder option.
        Returns:
        The tip-text as a string.
      • getDefaultSummaryFilter

        public weka.filters.Filter getDefaultSummaryFilter()
        Gets the default filter to use to summarise the attributes.
        Returns:
        The default filter.
      • setSummaryFilter

        public void setSummaryFilter​(weka.filters.Filter value)
        Sets the filter to use to summarise the attributes.
        Parameters:
        value - The filter.
      • getSummaryFilter

        public weka.filters.Filter getSummaryFilter()
        Gets the filter to use to summarise the attributes.
        Returns:
        The filter.
      • summaryFilterTipText

        public String summaryFilterTipText()
        Gets the tip-text for the pca-filter option.
        Returns:
        The tip-text as a string.
      • setPreserveIDColumn

        public void setPreserveIDColumn​(boolean value)
        Sets whether the first non-summary attribute should be treated as an ID and moved to the first attribute position.
        Parameters:
        value - True to preserve the ID column, false to not.
      • getPreserveIDColumn

        public boolean getPreserveIDColumn()
        Gets whether the first non-summary attribute should be treated as an ID and moved to the first attribute position.
        Returns:
        True to preserve the ID column, false to not.
      • preserveIDColumnTipText

        public String preserveIDColumnTipText()
        Gets the tip-text for the preserve-id-column option.
        Returns:
        The tip-text as a string.
      • getDefaultClassName

        public adams.core.base.BaseString getDefaultClassName()
        Gets the name of the default attribute to use as the class attribute for supervised summary filters.
        Returns:
        The default attribute name.
      • setClassName

        public void setClassName​(adams.core.base.BaseString value)
        Sets the name of the attribute to use as the class attribute for supervised summary filters.
        Parameters:
        value - The attribute name.
      • getClassName

        public adams.core.base.BaseString getClassName()
        Gets the name of the attribute to use as the class attribute for supervised summary filters.
        Returns:
        The attribute name.
      • classNameTipText

        public String classNameTipText()
        Gets the tip-text for the class-name option.
        Returns:
        The tip-text as a string.
      • setKeepSupervisedClass

        public void setKeepSupervisedClass​(boolean value)
        Sets whether to keep the class attribute of the summary attributes in the final dataset.
        Parameters:
        value - True to keep the attribute in the final dataset, false to discard it.
      • getKeepSupervisedClass

        public boolean getKeepSupervisedClass()
        Gets whether to keep the class attribute of the summary attributes in the final dataset.
        Returns:
        True to keep the attribute in the final dataset, false to discard it.
      • keepSupervisedClassTipText

        public String keepSupervisedClassTipText()
        Gets the tip-text for the keep-supervised-class option.
        Returns:
        The tip-text as a string.
      • allowAccessToFullInputFormat

        public boolean allowAccessToFullInputFormat()
        Returns whether to allow the determineOutputFormat(Instances) method access to the full dataset rather than just the header.

        Default implementation returns false.

        Overrides:
        allowAccessToFullInputFormat in class weka.filters.SimpleBatchFilter
        Returns:
        whether determineOutputFormat has access to the full input dataset
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns the Capabilities of this filter. Derived filters have to override this method to enable capabilities.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
        Returns:
        the capabilities of this object
        See Also:
        Capabilities
      • determineOutputFormat

        protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
                                                     throws Exception
        Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().
        Specified by:
        determineOutputFormat in class weka.filters.SimpleFilter
        Parameters:
        inputFormat - the input format to base the output format on
        Returns:
        the output format
        Throws:
        Exception - in case the determination goes wrong
        See Also:
        SimpleBatchFilter.hasImmediateOutputFormat(), SimpleBatchFilter.batchFinished()
      • process

        protected weka.core.Instances process​(weka.core.Instances instances)
                                       throws Exception
        Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().
        Specified by:
        process in class weka.filters.SimpleFilter
        Parameters:
        instances - the data to process
        Returns:
        the modified data
        Throws:
        Exception - in case the processing goes wrong
        See Also:
        SimpleBatchFilter.batchFinished()
      • formatOutput

        protected weka.core.Instances formatOutput​(weka.core.Instances filterOutput,
                                                   weka.core.Instances theRest)
        Handles merging of output datasets and formatting. Optionally moves the ID attribute to the first position. Optionally removes the class attribute for supervised filters.
        Parameters:
        filterOutput - The output of the attribute filter.
        theRest - The part of the input that was attribute-reduced.
        Returns:
        The formatted dataset.