Class WeightsBasedResample
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.SimpleFilter
-
- weka.filters.SimpleBatchFilter
-
- weka.filters.unsupervised.instance.WeightsBasedResample
-
- All Implemented Interfaces:
Serializable
,weka.core.CapabilitiesHandler
,weka.core.CapabilitiesIgnorer
,weka.core.CommandlineRunnable
,weka.core.OptionHandler
,weka.core.Randomizable
,weka.core.RevisionHandler
,weka.filters.UnsupervisedFilter
public class WeightsBasedResample extends weka.filters.SimpleBatchFilter implements weka.filters.UnsupervisedFilter, weka.core.Randomizable
Normalizes all instance weights and drops the ones that fall below the specified threshold, but at most the specified percentage.
Of the left over instances, the smallest weight, e.g., 0.2, represents one instance, which translates a weight of 1.0 to five instances. This factor can be limited to avoid an instance explosion if the smallest weight is very small.
The overall, final dataset size can be limited as well.
Valid options are:-drop-below <0.0-1.0> The threshold for the (normalized) weight below which instances get dropped. default: 0.0
-drop-at most <0.0-1.0> The maximum percentage of instances to drop (0-1). default: 1.0
-max-factor <num> The maximum factor to allow for instances to be multiplied with. Disabled if <= 0. default: -1
-size-limit <num> The size limit for the resulting dataset. Disabled if <= 0, percentage if 0<x<=10 (0-10,000%), >10 absolute number of instances. default: -1
-seed <num> The seed value for randomizing the final dataset. default: 1
-output-debug-info If set, filter is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, filter capabilities are not checked before filter is built (use with caution).
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected double
m_DropAtMost
the maximum percentage (0-1) of instances to drop.protected double
m_DropBelow
the threshold of weight below which to drop instances.protected double
m_MaxFactor
the upper limit of the multiplication factor (<= 0 is not capped).protected int
m_Seed
the seed for randomizing the final dataset.protected double
m_SizeLimit
the maximum size of the dataset to generate (<= 0 is off, <= 10 is percentage, > 10 is absolute).
-
Constructor Summary
Constructors Constructor Description WeightsBasedResample()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected weka.core.Instances
determineOutputFormat(weka.core.Instances inputFormat)
Determines the output format based on the input format and returns this.String
dropAtMostTipText()
Returns the tip text for this property.String
dropBelowTipText()
Returns the tip text for this property.weka.core.Capabilities
getCapabilities()
Returns the Capabilities of this filter.double
getDropAtMost()
Returns the maximum percentage of instances to drop.double
getDropBelow()
Returns the threshold of the normalized weights below which to drop instances.double
getMaxFactor()
Returns the upper limit for the multiplication factor for instances.String[]
getOptions()
Gets the current settings of the filter.String
getRevision()
Returns the revision string.int
getSeed()
Gets the seed for the random number generationsdouble
getSizeLimit()
Returns the threshold of the normalized weights below which to drop instances.String
globalInfo()
Returns a string describing this classifier.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(String[] args)
Main method for testing this class.String
maxFactorTipText()
Returns the tip text for this property.protected weka.core.Instances
process(weka.core.Instances instances)
Processes the given data (may change the provided dataset) and returns the modified version.String
seedTipText()
Returns the tip text for this property.void
setDropAtMost(double value)
Sets the maximum percentage of instances to drop.void
setDropBelow(double value)
Sets the threshold of the normalized weights below which to drop instances.void
setMaxFactor(double value)
Sets the upper limit for the multiplication factor for instances.void
setOptions(String[] options)
Parses a list of options for this object.void
setSeed(int seed)
Set the seed for random number generation.void
setSizeLimit(double value)
Sets the size limit for the final dataset.String
sizeLimitTipText()
Returns the tip text for this property.-
Methods inherited from class weka.filters.SimpleBatchFilter
allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, input
-
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
-
-
-
-
Field Detail
-
m_DropBelow
protected double m_DropBelow
the threshold of weight below which to drop instances.
-
m_DropAtMost
protected double m_DropAtMost
the maximum percentage (0-1) of instances to drop.
-
m_MaxFactor
protected double m_MaxFactor
the upper limit of the multiplication factor (<= 0 is not capped).
-
m_SizeLimit
protected double m_SizeLimit
the maximum size of the dataset to generate (<= 0 is off, <= 10 is percentage, > 10 is absolute).
-
m_Seed
protected int m_Seed
the seed for randomizing the final dataset.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing this classifier.- Specified by:
globalInfo
in classweka.filters.SimpleFilter
- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.filters.Filter
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Parses a list of options for this object.- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.filters.Filter
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current settings of the filter.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.filters.Filter
- Returns:
- an array of strings suitable for passing to setOptions
-
setDropBelow
public void setDropBelow(double value)
Sets the threshold of the normalized weights below which to drop instances.- Parameters:
value
- the threshold (0-1)
-
getDropBelow
public double getDropBelow()
Returns the threshold of the normalized weights below which to drop instances.- Returns:
- the threshold (0-1)
-
dropBelowTipText
public String dropBelowTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDropAtMost
public void setDropAtMost(double value)
Sets the maximum percentage of instances to drop.- Parameters:
value
- the percentage (0-1)
-
getDropAtMost
public double getDropAtMost()
Returns the maximum percentage of instances to drop.- Returns:
- the percentage (0-1)
-
dropAtMostTipText
public String dropAtMostTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMaxFactor
public void setMaxFactor(double value)
Sets the upper limit for the multiplication factor for instances. Disabled if <= 0.- Parameters:
value
- the upper limit
-
getMaxFactor
public double getMaxFactor()
Returns the upper limit for the multiplication factor for instances. Disabled if <= 0.- Returns:
- the upper limit
-
maxFactorTipText
public String maxFactorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSizeLimit
public void setSizeLimit(double value)
Sets the size limit for the final dataset. Disabled if <= 0, 010 absolute number of instances. - Parameters:
value
- the limit
-
getSizeLimit
public double getSizeLimit()
Returns the threshold of the normalized weights below which to drop instances. Disabled if <= 0, 010 absolute number of instances. - Returns:
- the limit
-
sizeLimitTipText
public String sizeLimitTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSeed
public void setSeed(int seed)
Set the seed for random number generation.- Specified by:
setSeed
in interfaceweka.core.Randomizable
- Parameters:
seed
- the seed
-
getSeed
public int getSeed()
Gets the seed for the random number generations- Specified by:
getSeed
in interfaceweka.core.Randomizable
- Returns:
- the seed for the random number generation
-
seedTipText
public String seedTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getCapabilities
public weka.core.Capabilities getCapabilities()
Returns the Capabilities of this filter.- Specified by:
getCapabilities
in interfaceweka.core.CapabilitiesHandler
- Overrides:
getCapabilities
in classweka.filters.Filter
- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
determineOutputFormat
protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat) throws Exception
Determines the output format based on the input format and returns this.- Specified by:
determineOutputFormat
in classweka.filters.SimpleFilter
- Parameters:
inputFormat
- the input format to base the output format on- Returns:
- the output format
- Throws:
Exception
- in case the determination goes wrong
-
process
protected weka.core.Instances process(weka.core.Instances instances) throws Exception
Processes the given data (may change the provided dataset) and returns the modified version. This method is called in batchFinished().- Specified by:
process
in classweka.filters.SimpleFilter
- Parameters:
instances
- the data to process- Returns:
- the modified data
- Throws:
Exception
- in case the processing goes wrong
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceweka.core.RevisionHandler
- Overrides:
getRevision
in classweka.filters.Filter
- Returns:
- the revision
-
main
public static void main(String[] args)
Main method for testing this class.- Parameters:
args
- should contain arguments to the filter: use -h for help
-
-