Class WeightsBasedResample

  • All Implemented Interfaces:
    Serializable, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.Randomizable, weka.core.RevisionHandler, weka.filters.UnsupervisedFilter

    public class WeightsBasedResample
    extends weka.filters.SimpleBatchFilter
    implements weka.filters.UnsupervisedFilter, weka.core.Randomizable
    Normalizes all instance weights and drops the ones that fall below the specified threshold, but at most the specified percentage.
    Of the left over instances, the smallest weight, e.g., 0.2, represents one instance, which translates a weight of 1.0 to five instances. This factor can be limited to avoid an instance explosion if the smallest weight is very small.
    The overall, final dataset size can be limited as well.

    Valid options are:

     -drop-below <0.0-1.0>
      The threshold for the (normalized) weight below which instances
      get dropped.
      default: 0.0
     -drop-at most <0.0-1.0>
      The maximum percentage of instances to drop (0-1).
      default: 1.0
     -max-factor <num>
      The maximum factor to allow for instances to be multiplied with.
      Disabled if <= 0.
      default: -1
     -size-limit <num>
      The size limit for the resulting dataset.
      Disabled if <= 0, percentage if 0<x<=10 (0-10,000%),
      >10 absolute number of instances.
      default: -1
     -seed <num>
      The seed value for randomizing the final dataset.
      default: 1
     -output-debug-info
      If set, filter is run in debug mode and
      may output additional info to the console
     -do-not-check-capabilities
      If set, filter capabilities are not checked before filter is built
      (use with caution).
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected double m_DropAtMost
      the maximum percentage (0-1) of instances to drop.
      protected double m_DropBelow
      the threshold of weight below which to drop instances.
      protected double m_MaxFactor
      the upper limit of the multiplication factor (<= 0 is not capped).
      protected int m_Seed
      the seed for randomizing the final dataset.
      protected double m_SizeLimit
      the maximum size of the dataset to generate (<= 0 is off, <= 10 is percentage, > 10 is absolute).
      • Fields inherited from class weka.filters.Filter

        m_Debug, m_DoNotCheckCapabilities, m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
      Determines the output format based on the input format and returns this.
      String dropAtMostTipText()
      Returns the tip text for this property.
      String dropBelowTipText()
      Returns the tip text for this property.
      weka.core.Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      double getDropAtMost()
      Returns the maximum percentage of instances to drop.
      double getDropBelow()
      Returns the threshold of the normalized weights below which to drop instances.
      double getMaxFactor()
      Returns the upper limit for the multiplication factor for instances.
      String[] getOptions()
      Gets the current settings of the filter.
      String getRevision()
      Returns the revision string.
      int getSeed()
      Gets the seed for the random number generations
      double getSizeLimit()
      Returns the threshold of the normalized weights below which to drop instances.
      String globalInfo()
      Returns a string describing this classifier.
      Enumeration listOptions()
      Returns an enumeration describing the available options.
      static void main​(String[] args)
      Main method for testing this class.
      String maxFactorTipText()
      Returns the tip text for this property.
      protected weka.core.Instances process​(weka.core.Instances instances)
      Processes the given data (may change the provided dataset) and returns the modified version.
      String seedTipText()
      Returns the tip text for this property.
      void setDropAtMost​(double value)
      Sets the maximum percentage of instances to drop.
      void setDropBelow​(double value)
      Sets the threshold of the normalized weights below which to drop instances.
      void setMaxFactor​(double value)
      Sets the upper limit for the multiplication factor for instances.
      void setOptions​(String[] options)
      Parses a list of options for this object.
      void setSeed​(int seed)
      Set the seed for random number generation.
      void setSizeLimit​(double value)
      Sets the size limit for the final dataset.
      String sizeLimitTipText()
      Returns the tip text for this property.
      • Methods inherited from class weka.filters.SimpleBatchFilter

        allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input, input
      • Methods inherited from class weka.filters.SimpleFilter

        reset, setInputFormat
      • Methods inherited from class weka.filters.Filter

        batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
    • Field Detail

      • m_DropBelow

        protected double m_DropBelow
        the threshold of weight below which to drop instances.
      • m_DropAtMost

        protected double m_DropAtMost
        the maximum percentage (0-1) of instances to drop.
      • m_MaxFactor

        protected double m_MaxFactor
        the upper limit of the multiplication factor (<= 0 is not capped).
      • m_SizeLimit

        protected double m_SizeLimit
        the maximum size of the dataset to generate (<= 0 is off, <= 10 is percentage, > 10 is absolute).
      • m_Seed

        protected int m_Seed
        the seed for randomizing the final dataset.
    • Constructor Detail

      • WeightsBasedResample

        public WeightsBasedResample()
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this classifier.
        Specified by:
        globalInfo in class weka.filters.SimpleFilter
        Returns:
        a description of the classifier suitable for displaying in the explorer/experimenter gui
      • listOptions

        public Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.filters.Filter
        Returns:
        an enumeration of all the available options.
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses a list of options for this object.
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.filters.Filter
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
      • getOptions

        public String[] getOptions()
        Gets the current settings of the filter.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.filters.Filter
        Returns:
        an array of strings suitable for passing to setOptions
      • setDropBelow

        public void setDropBelow​(double value)
        Sets the threshold of the normalized weights below which to drop instances.
        Parameters:
        value - the threshold (0-1)
      • getDropBelow

        public double getDropBelow()
        Returns the threshold of the normalized weights below which to drop instances.
        Returns:
        the threshold (0-1)
      • dropBelowTipText

        public String dropBelowTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDropAtMost

        public void setDropAtMost​(double value)
        Sets the maximum percentage of instances to drop.
        Parameters:
        value - the percentage (0-1)
      • getDropAtMost

        public double getDropAtMost()
        Returns the maximum percentage of instances to drop.
        Returns:
        the percentage (0-1)
      • dropAtMostTipText

        public String dropAtMostTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMaxFactor

        public void setMaxFactor​(double value)
        Sets the upper limit for the multiplication factor for instances. Disabled if <= 0.
        Parameters:
        value - the upper limit
      • getMaxFactor

        public double getMaxFactor()
        Returns the upper limit for the multiplication factor for instances. Disabled if <= 0.
        Returns:
        the upper limit
      • maxFactorTipText

        public String maxFactorTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setSizeLimit

        public void setSizeLimit​(double value)
        Sets the size limit for the final dataset. Disabled if <= 0, 010 absolute number of instances.
        Parameters:
        value - the limit
      • getSizeLimit

        public double getSizeLimit()
        Returns the threshold of the normalized weights below which to drop instances. Disabled if <= 0, 010 absolute number of instances.
        Returns:
        the limit
      • sizeLimitTipText

        public String sizeLimitTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setSeed

        public void setSeed​(int seed)
        Set the seed for random number generation.
        Specified by:
        setSeed in interface weka.core.Randomizable
        Parameters:
        seed - the seed
      • getSeed

        public int getSeed()
        Gets the seed for the random number generations
        Specified by:
        getSeed in interface weka.core.Randomizable
        Returns:
        the seed for the random number generation
      • seedTipText

        public String seedTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns the Capabilities of this filter.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Overrides:
        getCapabilities in class weka.filters.Filter
        Returns:
        the capabilities of this object
        See Also:
        Capabilities
      • determineOutputFormat

        protected weka.core.Instances determineOutputFormat​(weka.core.Instances inputFormat)
                                                     throws Exception
        Determines the output format based on the input format and returns this.
        Specified by:
        determineOutputFormat in class weka.filters.SimpleFilter
        Parameters:
        inputFormat - the input format to base the output format on
        Returns:
        the output format
        Throws:
        Exception - in case the determination goes wrong
      • process

        protected weka.core.Instances process​(weka.core.Instances instances)
                                       throws Exception
        Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().
        Specified by:
        process in class weka.filters.SimpleFilter
        Parameters:
        instances - the data to process
        Returns:
        the modified data
        Throws:
        Exception - in case the processing goes wrong
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.filters.Filter
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        Main method for testing this class.
        Parameters:
        args - should contain arguments to the filter: use -h for help