Class MergeTwoAttributes

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.RevisionHandler, weka.filters.UnsupervisedFilter

    public class MergeTwoAttributes
    extends weka.filters.SimpleBatchFilter
    implements weka.filters.UnsupervisedFilter
    Merges two attributes, offers various strategies if values differ or not present.
    Uses the common subsequence (either from start or end) of the two attributes as name of the merged attribute, otherwise the concatenation of the both (separated by '-'). If this new name should already be present, then a number is appended to the name to make it unique.
    The merged attribute can either be left at the default position (either first or second attribute, whichever comes first) or moved to a specific one.
    If one of the two attributes to be merged is the current class attribute, the newly created merged attribute will become the new class attribute.

    Valid options are:

     -D
      Turns on output of debugging information.
     -first <att name>
      The name of the first attribute.
      (default: att1)
     -second <att name>
      The name of the second attribute.
      (default: att2)
     -remove-chars <chars>
      The characters to remove from the start/end of the
      generated name for the merged attribute.
      (default:  -_.)
     -merged-index <position>
      The new position for the merged attribute.
      Empty string is default position, i.e., either the position
      of the first or second attribute (whichever comes first)
      (default: )
     -differ <MISSING|AVERAGE|FIRST|SECOND>
      The strategy to apply in case the values of the two attributes differ.
      (default: MISSING)
     -one-missing <MISSING|PRESENT>
      The strategy to apply in case one of the two values is missing.
      (default: MISSING)
    Version:
    $Revision$
    Author:
    FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected String commonSubsequence​(String s1, String s2, boolean forward)
      Determines the common subsequence of the two strings.
      protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
      Determines the output format based on the input format and returns this.
      String differTipText()
      Returns the tip text for this property.
      String firstAttributeTipText()
      Returns the tip text for this property.
      weka.core.Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      weka.core.SelectedTag getDiffer()
      Gets the type of strategy to apply if the two values differ.
      String getFirstAttribute()
      Gets the name of the first attribute.
      String getMergedIndex()
      Gets the position for the merged attribute.
      weka.core.SelectedTag getOneMissing()
      Gets the type of strategy to apply if one of the values is missing.
      String[] getOptions()
      returns the options of the current setup.
      String getRemoveChars()
      Gets the characters to remove from start/end of the generated name.
      String getRevision()
      Returns the revision string.
      String getSecondAttribute()
      Gets the name of the second attribute.
      String globalInfo()
      Returns a string describing this classifier.
      Enumeration listOptions()
      Gets an enumeration describing the available options.
      static void main​(String[] args)
      runs the filter with the given arguments.
      String mergedIndexTipText()
      Returns the tip text for this property.
      String oneMissingTipText()
      Returns the tip text for this property.
      protected weka.core.Instances process​(weka.core.Instances instances)
      Processes the given data (may change the provided dataset) and returns the modified version.
      String removeCharsTipText()
      Returns the tip text for this property.
      String secondAttributeTipText()
      Returns the tip text for this property.
      void setDiffer​(weka.core.SelectedTag value)
      Sets the type of strategy to apply if the two values differ.
      void setFirstAttribute​(String value)
      Sets the name of the first attribute.
      void setMergedIndex​(String value)
      Sets the position for the merged attribute.
      void setOneMissing​(weka.core.SelectedTag value)
      Sets the type of strategy to apply if one of the values is missing.
      void setOptions​(String[] options)
      Parses the options for this object.
      void setRemoveChars​(String value)
      Sets the characters to remove from start/end of the generated name.
      void setSecondAttribute​(String value)
      Sets the name of the second attribute.
      • Methods inherited from class weka.filters.SimpleBatchFilter

        allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input
      • Methods inherited from class weka.filters.SimpleFilter

        reset, setInputFormat
      • Methods inherited from class weka.filters.Filter

        batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
    • Field Detail

      • VALUESDIFFER_MISSING

        public static final int VALUESDIFFER_MISSING
        how to handle differing values: missing.
        See Also:
        Constant Field Values
      • VALUESDIFFER_AVERAGE

        public static final int VALUESDIFFER_AVERAGE
        how to handle differing values: average.
        See Also:
        Constant Field Values
      • VALUESDIFFER_FIRST

        public static final int VALUESDIFFER_FIRST
        how to handle differing values: first.
        See Also:
        Constant Field Values
      • VALUESDIFFER_SECOND

        public static final int VALUESDIFFER_SECOND
        how to handle differing values: SECOND.
        See Also:
        Constant Field Values
      • TAGS_VALUESDIFFER

        public static final weka.core.Tag[] TAGS_VALUESDIFFER
        the types of how to handle differing values.
      • ONEMISSING_MISSING

        public static final int ONEMISSING_MISSING
        what to do if one is missing: missing.
        See Also:
        Constant Field Values
      • ONEMISSING_USE_PRESENT

        public static final int ONEMISSING_USE_PRESENT
        what to do if one is missing: use present value.
        See Also:
        Constant Field Values
      • TAGS_ONEMISSING

        public static final weka.core.Tag[] TAGS_ONEMISSING
      • DEFAULT_REMOVE_CHARS

        public static final String DEFAULT_REMOVE_CHARS
        characters to remove from start/end of the merged name.
        See Also:
        Constant Field Values
      • DEFAULT_NAME_SECOND

        public static final String DEFAULT_NAME_SECOND
        the default second attribute name.
        See Also:
        Constant Field Values
      • m_FirstAttribute

        protected String m_FirstAttribute
        the name of the first attribute.
      • m_SecondAttribute

        protected String m_SecondAttribute
        the name of the second attribute.
      • m_Differ

        protected int m_Differ
        how to handle differing values.
      • m_OneMissing

        protected int m_OneMissing
        what to do if one value is missing.
      • m_RemoveChars

        protected String m_RemoveChars
        the characters to remove from the merged name (start/end).
      • m_MergedIndex

        protected weka.core.SingleIndex m_MergedIndex
        the position for the merged attribute (empty = leave at default position).
      • m_Merged

        protected String m_Merged
        the name of the merged attribute.
    • Constructor Detail

      • MergeTwoAttributes

        public MergeTwoAttributes()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this classifier.
        Specified by:
        globalInfo in class weka.filters.SimpleFilter
        Returns:
        a description of the classifier suitable for displaying in the explorer/experimenter gui
      • listOptions

        public Enumeration listOptions()
        Gets an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • getOptions

        public String[] getOptions()
        returns the options of the current setup.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        the current options
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses the options for this object.

        Valid options are:

         -D
          Turns on output of debugging information.
         -first <att name>
          The name of the first attribute.
          (default: att1)
         -second <att name>
          The name of the second attribute.
          (default: att2)
         -remove-chars <chars>
          The characters to remove from the start/end of the
          generated name for the merged attribute.
          (default:  -_.)
         -merged-index <position>
          The new position for the merged attribute.
          Empty string is default position, i.e., either the position
          of the first or second attribute (whichever comes first)
          (default: )
         -differ <MISSING|AVERAGE|FIRST|SECOND>
          The strategy to apply in case the values of the two attributes differ.
          (default: MISSING)
         -one-missing <MISSING|PRESENT>
          The strategy to apply in case one of the two values is missing.
          (default: MISSING)
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the options to use
        Throws:
        Exception - if the option setting fails
      • setFirstAttribute

        public void setFirstAttribute​(String value)
        Sets the name of the first attribute.
        Parameters:
        value - the name
      • getFirstAttribute

        public String getFirstAttribute()
        Gets the name of the first attribute.
        Returns:
        the name
      • firstAttributeTipText

        public String firstAttributeTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setSecondAttribute

        public void setSecondAttribute​(String value)
        Sets the name of the second attribute.
        Parameters:
        value - the name
      • getSecondAttribute

        public String getSecondAttribute()
        Gets the name of the second attribute.
        Returns:
        the name
      • secondAttributeTipText

        public String secondAttributeTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMergedIndex

        public void setMergedIndex​(String value)
        Sets the position for the merged attribute.
        Parameters:
        value - the position, empty string for default
      • getMergedIndex

        public String getMergedIndex()
        Gets the position for the merged attribute.
        Returns:
        the position, empty string for default
      • mergedIndexTipText

        public String mergedIndexTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setRemoveChars

        public void setRemoveChars​(String value)
        Sets the characters to remove from start/end of the generated name.
        Parameters:
        value - the characters
      • getRemoveChars

        public String getRemoveChars()
        Gets the characters to remove from start/end of the generated name.
        Returns:
        the characters
      • removeCharsTipText

        public String removeCharsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDiffer

        public void setDiffer​(weka.core.SelectedTag value)
        Sets the type of strategy to apply if the two values differ.
        Parameters:
        value - the strategy
      • getDiffer

        public weka.core.SelectedTag getDiffer()
        Gets the type of strategy to apply if the two values differ.
        Returns:
        the strategy
      • differTipText

        public String differTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setOneMissing

        public void setOneMissing​(weka.core.SelectedTag value)
        Sets the type of strategy to apply if one of the values is missing.
        Parameters:
        value - the strategy
      • getOneMissing

        public weka.core.SelectedTag getOneMissing()
        Gets the type of strategy to apply if one of the values is missing.
        Returns:
        the strategy
      • oneMissingTipText

        public String oneMissingTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • commonSubsequence

        protected String commonSubsequence​(String s1,
                                           String s2,
                                           boolean forward)
        Determines the common subsequence of the two strings.
        Parameters:
        s1 - the first string
        s2 - the second string
        forward - if false, the strings are search from back to front
      • determineOutputFormat

        protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
                                                     throws Exception
        Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., immediateOutputFormat() returns false, then this method will be called from batchFinished().
        Specified by:
        determineOutputFormat in class weka.filters.SimpleFilter
        Parameters:
        inputFormat - the input format to base the output format on
        Returns:
        the output format
        Throws:
        Exception - in case the determination goes wrong
        See Also:
        SimpleBatchFilter.hasImmediateOutputFormat(), SimpleBatchFilter.batchFinished()
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns the Capabilities of this filter.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
        Returns:
        the capabilities of this object
        See Also:
        Capabilities
      • process

        protected weka.core.Instances process​(weka.core.Instances instances)
                                       throws Exception
        Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().
        Specified by:
        process in class weka.filters.SimpleFilter
        Parameters:
        instances - the data to process
        Returns:
        the modified data
        Throws:
        Exception - in case the processing goes wrong
        See Also:
        SimpleBatchFilter.batchFinished()
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.filters.Filter
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        runs the filter with the given arguments.
        Parameters:
        args - the commandline arguments