Class WeightsBasedResample
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.SimpleFilter
-
- weka.filters.SimpleBatchFilter
-
- weka.filters.unsupervised.instance.WeightsBasedResample
-
- All Implemented Interfaces:
Serializable,weka.core.CapabilitiesHandler,weka.core.CapabilitiesIgnorer,weka.core.CommandlineRunnable,weka.core.OptionHandler,weka.core.Randomizable,weka.core.RevisionHandler,weka.filters.UnsupervisedFilter
public class WeightsBasedResample extends weka.filters.SimpleBatchFilter implements weka.filters.UnsupervisedFilter, weka.core.RandomizableNormalizes all instance weights and drops the ones that fall below the specified threshold, but at most the specified percentage.
Of the left over instances, the smallest weight, e.g., 0.2, represents one instance, which translates a weight of 1.0 to five instances. This factor can be limited to avoid an instance explosion if the smallest weight is very small.
The overall, final dataset size can be limited as well.
Valid options are:-drop-below <0.0-1.0> The threshold for the (normalized) weight below which instances get dropped. default: 0.0
-drop-at most <0.0-1.0> The maximum percentage of instances to drop (0-1). default: 1.0
-max-factor <num> The maximum factor to allow for instances to be multiplied with. Disabled if <= 0. default: -1
-size-limit <num> The size limit for the resulting dataset. Disabled if <= 0, percentage if 0<x<=10 (0-10,000%), >10 absolute number of instances. default: -1
-seed <num> The seed value for randomizing the final dataset. default: 1
-output-debug-info If set, filter is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, filter capabilities are not checked before filter is built (use with caution).
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected doublem_DropAtMostthe maximum percentage (0-1) of instances to drop.protected doublem_DropBelowthe threshold of weight below which to drop instances.protected doublem_MaxFactorthe upper limit of the multiplication factor (<= 0 is not capped).protected intm_Seedthe seed for randomizing the final dataset.protected doublem_SizeLimitthe maximum size of the dataset to generate (<= 0 is off, <= 10 is percentage, > 10 is absolute).
-
Constructor Summary
Constructors Constructor Description WeightsBasedResample()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected weka.core.InstancesdetermineOutputFormat(weka.core.Instances inputFormat)Determines the output format based on the input format and returns this.StringdropAtMostTipText()Returns the tip text for this property.StringdropBelowTipText()Returns the tip text for this property.weka.core.CapabilitiesgetCapabilities()Returns the Capabilities of this filter.doublegetDropAtMost()Returns the maximum percentage of instances to drop.doublegetDropBelow()Returns the threshold of the normalized weights below which to drop instances.doublegetMaxFactor()Returns the upper limit for the multiplication factor for instances.String[]getOptions()Gets the current settings of the filter.StringgetRevision()Returns the revision string.intgetSeed()Gets the seed for the random number generationsdoublegetSizeLimit()Returns the threshold of the normalized weights below which to drop instances.StringglobalInfo()Returns a string describing this classifier.EnumerationlistOptions()Returns an enumeration describing the available options.static voidmain(String[] args)Main method for testing this class.StringmaxFactorTipText()Returns the tip text for this property.protected weka.core.Instancesprocess(weka.core.Instances instances)Processes the given data (may change the provided dataset) and returns the modified version.StringseedTipText()Returns the tip text for this property.voidsetDropAtMost(double value)Sets the maximum percentage of instances to drop.voidsetDropBelow(double value)Sets the threshold of the normalized weights below which to drop instances.voidsetMaxFactor(double value)Sets the upper limit for the multiplication factor for instances.voidsetOptions(String[] options)Parses a list of options for this object.voidsetSeed(int seed)Set the seed for random number generation.voidsetSizeLimit(double value)Sets the size limit for the final dataset.StringsizeLimitTipText()Returns the tip text for this property.-
Methods inherited from class weka.filters.SimpleBatchFilter
allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input
-
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
-
-
-
-
Field Detail
-
m_DropBelow
protected double m_DropBelow
the threshold of weight below which to drop instances.
-
m_DropAtMost
protected double m_DropAtMost
the maximum percentage (0-1) of instances to drop.
-
m_MaxFactor
protected double m_MaxFactor
the upper limit of the multiplication factor (<= 0 is not capped).
-
m_SizeLimit
protected double m_SizeLimit
the maximum size of the dataset to generate (<= 0 is off, <= 10 is percentage, > 10 is absolute).
-
m_Seed
protected int m_Seed
the seed for randomizing the final dataset.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing this classifier.- Specified by:
globalInfoin classweka.filters.SimpleFilter- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptionsin interfaceweka.core.OptionHandler- Overrides:
listOptionsin classweka.filters.Filter- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Parses a list of options for this object.- Specified by:
setOptionsin interfaceweka.core.OptionHandler- Overrides:
setOptionsin classweka.filters.Filter- Parameters:
options- the list of options as an array of strings- Throws:
Exception- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current settings of the filter.- Specified by:
getOptionsin interfaceweka.core.OptionHandler- Overrides:
getOptionsin classweka.filters.Filter- Returns:
- an array of strings suitable for passing to setOptions
-
setDropBelow
public void setDropBelow(double value)
Sets the threshold of the normalized weights below which to drop instances.- Parameters:
value- the threshold (0-1)
-
getDropBelow
public double getDropBelow()
Returns the threshold of the normalized weights below which to drop instances.- Returns:
- the threshold (0-1)
-
dropBelowTipText
public String dropBelowTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDropAtMost
public void setDropAtMost(double value)
Sets the maximum percentage of instances to drop.- Parameters:
value- the percentage (0-1)
-
getDropAtMost
public double getDropAtMost()
Returns the maximum percentage of instances to drop.- Returns:
- the percentage (0-1)
-
dropAtMostTipText
public String dropAtMostTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMaxFactor
public void setMaxFactor(double value)
Sets the upper limit for the multiplication factor for instances. Disabled if <= 0.- Parameters:
value- the upper limit
-
getMaxFactor
public double getMaxFactor()
Returns the upper limit for the multiplication factor for instances. Disabled if <= 0.- Returns:
- the upper limit
-
maxFactorTipText
public String maxFactorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSizeLimit
public void setSizeLimit(double value)
Sets the size limit for the final dataset. Disabled if <= 0, 010 absolute number of instances. - Parameters:
value- the limit
-
getSizeLimit
public double getSizeLimit()
Returns the threshold of the normalized weights below which to drop instances. Disabled if <= 0, 010 absolute number of instances. - Returns:
- the limit
-
sizeLimitTipText
public String sizeLimitTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSeed
public void setSeed(int seed)
Set the seed for random number generation.- Specified by:
setSeedin interfaceweka.core.Randomizable- Parameters:
seed- the seed
-
getSeed
public int getSeed()
Gets the seed for the random number generations- Specified by:
getSeedin interfaceweka.core.Randomizable- Returns:
- the seed for the random number generation
-
seedTipText
public String seedTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getCapabilities
public weka.core.Capabilities getCapabilities()
Returns the Capabilities of this filter.- Specified by:
getCapabilitiesin interfaceweka.core.CapabilitiesHandler- Overrides:
getCapabilitiesin classweka.filters.Filter- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
determineOutputFormat
protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat) throws ExceptionDetermines the output format based on the input format and returns this.- Specified by:
determineOutputFormatin classweka.filters.SimpleFilter- Parameters:
inputFormat- the input format to base the output format on- Returns:
- the output format
- Throws:
Exception- in case the determination goes wrong
-
process
protected weka.core.Instances process(weka.core.Instances instances) throws ExceptionProcesses the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().- Specified by:
processin classweka.filters.SimpleFilter- Parameters:
instances- the data to process- Returns:
- the modified data
- Throws:
Exception- in case the processing goes wrong
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevisionin interfaceweka.core.RevisionHandler- Overrides:
getRevisionin classweka.filters.Filter- Returns:
- the revision
-
main
public static void main(String[] args)
Main method for testing this class.- Parameters:
args- should contain arguments to the filter: use -h for help
-
-