Class MergeManyAttributes

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.RevisionHandler, weka.filters.UnsupervisedFilter

    public class MergeManyAttributes
    extends weka.filters.SimpleBatchFilter
    implements weka.filters.UnsupervisedFilter
    Merges two or more attributes, offers various strategies if values differ or not present.
    Uses the common subsequence (either from start or end) of the attributes as name of the merged attribute, otherwise the concatenation of them (separated by '-'). If this new name should already be present, then a number is appended to the name to make it unique.
    The merged attribute can either be left at the default position (whichever one of the attributes that comes first) or moved to a specific one.
    If one of the attributes to be merged is the current class attribute, the newly created merged attribute will become the new class attribute.

    Valid options are:

     -D
      Turns on output of debugging information.
     -att-name <att name>
      The name of the attribute, can be supplied multiple times.
     -remove-chars <chars>
      The characters to remove from the start/end of the
      generated name for the merged attribute.
      (default:  -_.)
     -merged-index <position>
      The new position for the merged attribute.
      Empty string is default position, i.e., either the position
      of the first or second attribute (whichever comes first)
      (default: )
     -differ <MISSING|AVERAGE>
      The strategy to apply in case the values of the attributes differ.
      (default: MISSING)
     -one-missing <MISSING|PRESENT>
      The strategy to apply in case one of the values is missing.
      (default: MISSING)
    Version:
    $Revision$
    Author:
    FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static String DEFAULT_REMOVE_CHARS
      characters to remove from start/end of the merged name.
      protected BaseString[] m_AttributeNames
      the attribute names.
      protected int m_Differ
      how to handle differing values.
      protected String m_Merged
      the name of the merged attribute.
      protected weka.core.SingleIndex m_MergedIndex
      the position for the merged attribute (empty = leave at default position).
      protected int m_OneMissing
      what to do if one value is missing.
      protected String m_RemoveChars
      the characters to remove from the merged name (start/end).
      static int ONEMISSING_MISSING
      what to do if one is missing: missing.
      static int ONEMISSING_USE_FIRST_PRESENT
      what to do if one is missing: use first present value.
      static weka.core.Tag[] TAGS_ONEMISSING  
      static weka.core.Tag[] TAGS_VALUESDIFFER
      the types of how to handle differing values.
      static int VALUESDIFFER_AVERAGE
      how to handle differing values: average.
      static int VALUESDIFFER_MISSING
      how to handle differing values: missing.
      • Fields inherited from class weka.filters.Filter

        m_Debug, m_DoNotCheckCapabilities, m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      String attributeNamesTipText()
      Returns the tip text for this property.
      protected String commonSubsequence​(String s1, String s2, boolean forward)
      Determines the common subsequence of the two strings.
      protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
      Determines the output format based on the input format and returns this.
      String differTipText()
      Returns the tip text for this property.
      BaseString[] getAttributeNames()
      Gets the names of the attributes.
      weka.core.Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      weka.core.SelectedTag getDiffer()
      Gets the type of strategy to apply if the two values differ.
      String getMergedIndex()
      Gets the position for the merged attribute.
      weka.core.SelectedTag getOneMissing()
      Gets the type of strategy to apply if one of the values is missing.
      String[] getOptions()
      returns the options of the current setup.
      String getRemoveChars()
      Gets the characters to remove from start/end of the generated name.
      String getRevision()
      Returns the revision string.
      String globalInfo()
      Returns a string describing this classifier.
      Enumeration listOptions()
      Gets an enumeration describing the available options.
      static void main​(String[] args)
      runs the filter with the given arguments.
      String mergedIndexTipText()
      Returns the tip text for this property.
      String oneMissingTipText()
      Returns the tip text for this property.
      protected weka.core.Instances process​(weka.core.Instances instances)
      Processes the given data (may change the provided dataset) and returns the modified version.
      String removeCharsTipText()
      Returns the tip text for this property.
      void setAttributeNames​(BaseString[] value)
      Sets the names of the attributes.
      void setDiffer​(weka.core.SelectedTag value)
      Sets the type of strategy to apply if the two values differ.
      void setMergedIndex​(String value)
      Sets the position for the merged attribute.
      void setOneMissing​(weka.core.SelectedTag value)
      Sets the type of strategy to apply if one of the values is missing.
      void setOptions​(String[] options)
      Parses the options for this object.
      void setRemoveChars​(String value)
      Sets the characters to remove from start/end of the generated name.
      • Methods inherited from class weka.filters.SimpleBatchFilter

        allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input, input
      • Methods inherited from class weka.filters.SimpleFilter

        reset, setInputFormat
      • Methods inherited from class weka.filters.Filter

        batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
    • Field Detail

      • VALUESDIFFER_MISSING

        public static final int VALUESDIFFER_MISSING
        how to handle differing values: missing.
        See Also:
        Constant Field Values
      • VALUESDIFFER_AVERAGE

        public static final int VALUESDIFFER_AVERAGE
        how to handle differing values: average.
        See Also:
        Constant Field Values
      • TAGS_VALUESDIFFER

        public static final weka.core.Tag[] TAGS_VALUESDIFFER
        the types of how to handle differing values.
      • ONEMISSING_MISSING

        public static final int ONEMISSING_MISSING
        what to do if one is missing: missing.
        See Also:
        Constant Field Values
      • ONEMISSING_USE_FIRST_PRESENT

        public static final int ONEMISSING_USE_FIRST_PRESENT
        what to do if one is missing: use first present value.
        See Also:
        Constant Field Values
      • TAGS_ONEMISSING

        public static final weka.core.Tag[] TAGS_ONEMISSING
      • DEFAULT_REMOVE_CHARS

        public static final String DEFAULT_REMOVE_CHARS
        characters to remove from start/end of the merged name.
        See Also:
        Constant Field Values
      • m_AttributeNames

        protected BaseString[] m_AttributeNames
        the attribute names.
      • m_Differ

        protected int m_Differ
        how to handle differing values.
      • m_OneMissing

        protected int m_OneMissing
        what to do if one value is missing.
      • m_RemoveChars

        protected String m_RemoveChars
        the characters to remove from the merged name (start/end).
      • m_MergedIndex

        protected weka.core.SingleIndex m_MergedIndex
        the position for the merged attribute (empty = leave at default position).
      • m_Merged

        protected String m_Merged
        the name of the merged attribute.
    • Constructor Detail

      • MergeManyAttributes

        public MergeManyAttributes()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this classifier.
        Specified by:
        globalInfo in class weka.filters.SimpleFilter
        Returns:
        a description of the classifier suitable for displaying in the explorer/experimenter gui
      • listOptions

        public Enumeration listOptions()
        Gets an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • getOptions

        public String[] getOptions()
        returns the options of the current setup.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        the current options
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses the options for this object.

        Valid options are:

         -D
          Turns on output of debugging information.
         -att-name <att name>
          The name of the attribute, can be supplied multiple times.
         -remove-chars <chars>
          The characters to remove from the start/end of the
          generated name for the merged attribute.
          (default:  -_.)
         -merged-index <position>
          The new position for the merged attribute.
          Empty string is default position, i.e., either the position
          of the first or second attribute (whichever comes first)
          (default: )
         -differ <MISSING|AVERAGE>
          The strategy to apply in case the values of the attributes differ.
          (default: MISSING)
         -one-missing <MISSING|PRESENT>
          The strategy to apply in case one of the values is missing.
          (default: MISSING)
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the options to use
        Throws:
        Exception - if the option setting fails
      • setAttributeNames

        public void setAttributeNames​(BaseString[] value)
        Sets the names of the attributes.
        Parameters:
        value - the names
      • getAttributeNames

        public BaseString[] getAttributeNames()
        Gets the names of the attributes.
        Returns:
        the names
      • attributeNamesTipText

        public String attributeNamesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMergedIndex

        public void setMergedIndex​(String value)
        Sets the position for the merged attribute.
        Parameters:
        value - the position, empty string for default
      • getMergedIndex

        public String getMergedIndex()
        Gets the position for the merged attribute.
        Returns:
        the position, empty string for default
      • mergedIndexTipText

        public String mergedIndexTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setRemoveChars

        public void setRemoveChars​(String value)
        Sets the characters to remove from start/end of the generated name.
        Parameters:
        value - the characters
      • getRemoveChars

        public String getRemoveChars()
        Gets the characters to remove from start/end of the generated name.
        Returns:
        the characters
      • removeCharsTipText

        public String removeCharsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDiffer

        public void setDiffer​(weka.core.SelectedTag value)
        Sets the type of strategy to apply if the two values differ.
        Parameters:
        value - the strategy
      • getDiffer

        public weka.core.SelectedTag getDiffer()
        Gets the type of strategy to apply if the two values differ.
        Returns:
        the strategy
      • differTipText

        public String differTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setOneMissing

        public void setOneMissing​(weka.core.SelectedTag value)
        Sets the type of strategy to apply if one of the values is missing.
        Parameters:
        value - the strategy
      • getOneMissing

        public weka.core.SelectedTag getOneMissing()
        Gets the type of strategy to apply if one of the values is missing.
        Returns:
        the strategy
      • oneMissingTipText

        public String oneMissingTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • commonSubsequence

        protected String commonSubsequence​(String s1,
                                           String s2,
                                           boolean forward)
        Determines the common subsequence of the two strings.
        Parameters:
        s1 - the first string
        s2 - the second string
        forward - if false, the strings are search from back to front
      • determineOutputFormat

        protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
                                                     throws Exception
        Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().
        Specified by:
        determineOutputFormat in class weka.filters.SimpleFilter
        Parameters:
        inputFormat - the input format to base the output format on
        Returns:
        the output format
        Throws:
        Exception - in case the determination goes wrong
        See Also:
        SimpleBatchFilter.hasImmediateOutputFormat(), SimpleBatchFilter.batchFinished()
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns the Capabilities of this filter.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
        Returns:
        the capabilities of this object
        See Also:
        Capabilities
      • process

        protected weka.core.Instances process​(weka.core.Instances instances)
                                       throws Exception
        Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().
        Specified by:
        process in class weka.filters.SimpleFilter
        Parameters:
        instances - the data to process
        Returns:
        the modified data
        Throws:
        Exception - in case the processing goes wrong
        See Also:
        SimpleBatchFilter.batchFinished()
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.filters.Filter
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        runs the filter with the given arguments.
        Parameters:
        args - the commandline arguments