Class NominalToNumeric
- java.lang.Object
-
- weka.filters.Filter
-
- weka.filters.SimpleFilter
-
- weka.filters.SimpleStreamFilter
-
- weka.filters.unsupervised.attribute.NominalToNumeric
-
- All Implemented Interfaces:
Serializable
,weka.core.CapabilitiesHandler
,weka.core.CapabilitiesIgnorer
,weka.core.CommandlineRunnable
,weka.core.OptionHandler
,weka.core.RevisionHandler
,weka.filters.StreamableFilter
public class NominalToNumeric extends weka.filters.SimpleStreamFilter
Converts a nominal attribute into a numeric one. Can either just use the internal representation of the labels as numeric value or parse the label itself (subset can be extracted via regexp).
Valid options are:-index <value> The index of the attribute to convert; An index is a number starting with 1; apart from attribute names (case-sensitive), the following placeholders can be used as well: first, second, third, last_2, last_1, last; numeric indices can be enforced by preceding them with '#' (eg '#12'); attribute names can be surrounded by double quotes. (default: index=last, max=-1)
-type <value> The type of conversion to perform. (default: INTERNAL_REPRESENTATION)
-find <value> The regular expression to use for extracting the numeric part from the label; use .* to match label as a whole. (default: .*)
-replace <value> The expression to use for assembling the numeric part; use $0 to use label as is. (default: $0)
-output-debug-info If set, filter is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, filter capabilities are not checked before filter is built (use with caution).
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
NominalToNumeric.ConversionType
Enumeration of conversion types.
-
Field Summary
Fields Modifier and Type Field Description static String
FIND
static String
INDEX
protected int
m_AttIndex
the attribute index.protected BaseRegExp
m_Find
the regular expression to use.protected WekaAttributeIndex
m_Index
the attribute to convert.protected Map<String,Double>
m_Mapping
the mapping between label and new value.protected String
m_Replace
the replacement string.protected NominalToNumeric.ConversionType
m_Type
the type of conversion to perform.static String
REPLACE
static String
TYPE
-
Constructor Summary
Constructors Constructor Description NominalToNumeric()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected weka.core.Instances
determineOutputFormat(weka.core.Instances inputFormat)
Determines the output format based on the input format and returns this.String
findTipText()
Returns the tip text for this property.weka.core.Capabilities
getCapabilities()
protected BaseRegExp
getDefaultFind()
Returns the default regular expression for finding tokens to clean.protected WekaAttributeIndex
getDefaultIndex()
Returns the default attribute index.protected String
getDefaultReplace()
Returns the default expression for replacing matching tokens with.protected NominalToNumeric.ConversionType
getDefaultType()
Returns the default regular expression for finding tokens to clean.BaseRegExp
getFind()
Returns the regular expression to use for extracting the numeric part from the label.WekaAttributeIndex
getIndex()
Returns the index of the attribute to convert.String[]
getOptions()
Gets the current option settings for the OptionHandler.String
getReplace()
Returns the expression to use for assembling the numeric part.String
getRevision()
Returns the revision string.NominalToNumeric.ConversionType
getType()
Returns the conversion type to use.String
globalInfo()
Returns a string describing this filter.String
indexTipText()
Returns the tip text for this property.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(String[] args)
Main method for testing this class.protected weka.core.Instance
process(weka.core.Instance instance)
processes the given instance (may change the provided instance) and returns the modified version.String
replaceTipText()
Returns the tip text for this property.protected void
reset()
Resets the cleaner.void
setFind(BaseRegExp value)
Sets the regular expression to use for extracting the numeric part from the label.void
setIndex(WekaAttributeIndex value)
Sets the index of the attribute to convert.void
setOptions(String[] options)
Sets the OptionHandler's options using the given list.void
setReplace(String value)
Sets the expression to use for assembling the numeric part.void
setType(NominalToNumeric.ConversionType value)
Sets the conversion type to use.String
typeTipText()
Returns the tip text for this property.-
Methods inherited from class weka.filters.SimpleStreamFilter
batchFinished, hasImmediateOutputFormat, input, preprocess, process
-
Methods inherited from class weka.filters.Filter
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
-
-
-
-
Field Detail
-
INDEX
public static final String INDEX
- See Also:
- Constant Field Values
-
TYPE
public static final String TYPE
- See Also:
- Constant Field Values
-
FIND
public static final String FIND
- See Also:
- Constant Field Values
-
REPLACE
public static final String REPLACE
- See Also:
- Constant Field Values
-
m_Index
protected WekaAttributeIndex m_Index
the attribute to convert.
-
m_Type
protected NominalToNumeric.ConversionType m_Type
the type of conversion to perform.
-
m_Find
protected BaseRegExp m_Find
the regular expression to use.
-
m_Replace
protected String m_Replace
the replacement string.
-
m_AttIndex
protected int m_AttIndex
the attribute index.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing this filter.- Specified by:
globalInfo
in classweka.filters.SimpleFilter
- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
listOptions
public Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceweka.core.OptionHandler
- Overrides:
listOptions
in classweka.filters.Filter
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(String[] options) throws Exception
Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).- Specified by:
setOptions
in interfaceweka.core.OptionHandler
- Overrides:
setOptions
in classweka.filters.Filter
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
public String[] getOptions()
Gets the current option settings for the OptionHandler.- Specified by:
getOptions
in interfaceweka.core.OptionHandler
- Overrides:
getOptions
in classweka.filters.Filter
- Returns:
- the list of current option settings as an array of strings
-
reset
protected void reset()
Resets the cleaner.- Overrides:
reset
in classweka.filters.SimpleFilter
-
getDefaultIndex
protected WekaAttributeIndex getDefaultIndex()
Returns the default attribute index.- Returns:
- the default
-
setIndex
public void setIndex(WekaAttributeIndex value)
Sets the index of the attribute to convert.- Parameters:
value
- the regexp
-
getIndex
public WekaAttributeIndex getIndex()
Returns the index of the attribute to convert.- Returns:
- the index
-
indexTipText
public String indexTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getDefaultType
protected NominalToNumeric.ConversionType getDefaultType()
Returns the default regular expression for finding tokens to clean.- Returns:
- the default
-
setType
public void setType(NominalToNumeric.ConversionType value)
Sets the conversion type to use.- Parameters:
value
- the type
-
getType
public NominalToNumeric.ConversionType getType()
Returns the conversion type to use.- Returns:
- the type
-
typeTipText
public String typeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getDefaultFind
protected BaseRegExp getDefaultFind()
Returns the default regular expression for finding tokens to clean.- Returns:
- the default
-
setFind
public void setFind(BaseRegExp value)
Sets the regular expression to use for extracting the numeric part from the label.- Parameters:
value
- the regexp
-
getFind
public BaseRegExp getFind()
Returns the regular expression to use for extracting the numeric part from the label.- Returns:
- the regexp
-
findTipText
public String findTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getDefaultReplace
protected String getDefaultReplace()
Returns the default expression for replacing matching tokens with.- Returns:
- the default
-
setReplace
public void setReplace(String value)
Sets the expression to use for assembling the numeric part.- Parameters:
value
- the expression
-
getReplace
public String getReplace()
Returns the expression to use for assembling the numeric part.- Returns:
- the expression
-
replaceTipText
public String replaceTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
getCapabilities
public weka.core.Capabilities getCapabilities()
- Specified by:
getCapabilities
in interfaceweka.core.CapabilitiesHandler
- Overrides:
getCapabilities
in classweka.filters.Filter
-
determineOutputFormat
protected weka.core.Instances determineOutputFormat(weka.core.Instances inputFormat) throws Exception
Determines the output format based on the input format and returns this. In case the output format cannot be returned immediately, i.e., hasImmediateOutputFormat() returns false, then this method will called from batchFinished() after the call of preprocess(Instances), in which, e.g., statistics for the actual processing step can be gathered.- Specified by:
determineOutputFormat
in classweka.filters.SimpleStreamFilter
- Parameters:
inputFormat
- the input format to base the output format on- Returns:
- the output format
- Throws:
Exception
- in case the determination goes wrong
-
process
protected weka.core.Instance process(weka.core.Instance instance) throws Exception
processes the given instance (may change the provided instance) and returns the modified version.- Specified by:
process
in classweka.filters.SimpleStreamFilter
- Parameters:
instance
- the instance to process- Returns:
- the modified data
- Throws:
Exception
- in case the processing goes wrong
-
getRevision
public String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceweka.core.RevisionHandler
- Overrides:
getRevision
in classweka.filters.Filter
- Returns:
- the revision
-
main
public static void main(String[] args)
Main method for testing this class.- Parameters:
args
- should contain arguments to the filter: use -h for help
-
-