weka.classifiers.evaluation
Class ThresholdCurve

java.lang.Object
  extended by weka.classifiers.evaluation.ThresholdCurve
All Implemented Interfaces:
RevisionHandler

public class ThresholdCurve
extends Object
implements RevisionHandler

Generates points illustrating prediction tradeoffs that can be obtained by varying the threshold value between classes. For example, the typical threshold value of 0.5 means the predicted probability of "positive" must be higher than 0.5 for the instance to be predicted as "positive". The resulting dataset can be used to visualize precision/recall tradeoff, or for ROC curve analysis (true positive rate vs false positive rate). Weka just varies the threshold on the class probability estimates in each case. The Mann Whitney statistic is used to calculate the AUC.

Version:
$Revision: 8034 $
Author:
Len Trigg (len@reeltwo.com)

Field Summary
static String FALLOUT_NAME
          attribute name: Fallout
static String FALSE_NEG_NAME
          attribute name: False Negatives
static String FALSE_POS_NAME
          attribute name: False Positives
static String FMEASURE_NAME
          attribute name: FMeasure
static String FP_RATE_NAME
          attribute name: False Positive Rate"
static String LIFT_NAME
          attribute name: Lift
static String PRECISION_NAME
          attribute name: Precision
static String RECALL_NAME
          attribute name: Recall
static String RELATION_NAME
          The name of the relation used in threshold curve datasets
static String SAMPLE_SIZE_NAME
          attribute name: Sample Size
static String THRESHOLD_NAME
          attribute name: Threshold
static String TP_RATE_NAME
          attribute name: True Positive Rate
static String TRUE_NEG_NAME
          attribute name: True Negatives
static String TRUE_POS_NAME
          attribute name: True Positives
 
Constructor Summary
ThresholdCurve()
           
 
Method Summary
 Instances getCurve(FastVector predictions)
          Calculates the performance stats for the default class and return results as a set of Instances.
 Instances getCurve(FastVector predictions, int classIndex)
          Calculates the performance stats for the desired class and return results as a set of Instances.
static double getNPointPrecision(Instances tcurve, int n)
          Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.
static double getPRCArea(Instances tcurve)
          Calculates the area under the precision-recall curve (AUPRC).
 String getRevision()
          Returns the revision string.
static double getROCArea(Instances tcurve)
          Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.
static int getThresholdInstance(Instances tcurve, double threshold)
          Gets the index of the instance with the closest threshold value to the desired target
static void main(String[] args)
          Tests the ThresholdCurve generation from the command line.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

RELATION_NAME

public static final String RELATION_NAME
The name of the relation used in threshold curve datasets

See Also:
Constant Field Values

TRUE_POS_NAME

public static final String TRUE_POS_NAME
attribute name: True Positives

See Also:
Constant Field Values

FALSE_NEG_NAME

public static final String FALSE_NEG_NAME
attribute name: False Negatives

See Also:
Constant Field Values

FALSE_POS_NAME

public static final String FALSE_POS_NAME
attribute name: False Positives

See Also:
Constant Field Values

TRUE_NEG_NAME

public static final String TRUE_NEG_NAME
attribute name: True Negatives

See Also:
Constant Field Values

FP_RATE_NAME

public static final String FP_RATE_NAME
attribute name: False Positive Rate"

See Also:
Constant Field Values

TP_RATE_NAME

public static final String TP_RATE_NAME
attribute name: True Positive Rate

See Also:
Constant Field Values

PRECISION_NAME

public static final String PRECISION_NAME
attribute name: Precision

See Also:
Constant Field Values

RECALL_NAME

public static final String RECALL_NAME
attribute name: Recall

See Also:
Constant Field Values

FALLOUT_NAME

public static final String FALLOUT_NAME
attribute name: Fallout

See Also:
Constant Field Values

FMEASURE_NAME

public static final String FMEASURE_NAME
attribute name: FMeasure

See Also:
Constant Field Values

SAMPLE_SIZE_NAME

public static final String SAMPLE_SIZE_NAME
attribute name: Sample Size

See Also:
Constant Field Values

LIFT_NAME

public static final String LIFT_NAME
attribute name: Lift

See Also:
Constant Field Values

THRESHOLD_NAME

public static final String THRESHOLD_NAME
attribute name: Threshold

See Also:
Constant Field Values
Constructor Detail

ThresholdCurve

public ThresholdCurve()
Method Detail

getCurve

public Instances getCurve(FastVector predictions)
Calculates the performance stats for the default class and return results as a set of Instances. The structure of these Instances is as follows:

For the definitions of these measures, see TwoClassStats

Parameters:
predictions - the predictions to base the curve on
Returns:
datapoints as a set of instances, null if no predictions have been made.
See Also:
TwoClassStats

getCurve

public Instances getCurve(FastVector predictions,
                          int classIndex)
Calculates the performance stats for the desired class and return results as a set of Instances.

Parameters:
predictions - the predictions to base the curve on
classIndex - index of the class of interest.
Returns:
datapoints as a set of instances.

getNPointPrecision

public static double getNPointPrecision(Instances tcurve,
                                        int n)
Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.

Parameters:
tcurve - a previously extracted threshold curve Instances.
n - the number of points to average over.
Returns:
the n-point precision.

getPRCArea

public static double getPRCArea(Instances tcurve)
Calculates the area under the precision-recall curve (AUPRC).

Parameters:
tcurve - a previously extracted threshold curve Instances.
Returns:
the PRC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.

getROCArea

public static double getROCArea(Instances tcurve)
Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.

Parameters:
tcurve - a previously extracted threshold curve Instances.
Returns:
the ROC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.

getThresholdInstance

public static int getThresholdInstance(Instances tcurve,
                                       double threshold)
Gets the index of the instance with the closest threshold value to the desired target

Parameters:
tcurve - a set of instances that have been generated by this class
threshold - the target threshold
Returns:
the index of the instance that has threshold closest to the target, or -1 if this could not be found (i.e. no data, or bad threshold target)

getRevision

public String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Returns:
the revision

main

public static void main(String[] args)
Tests the ThresholdCurve generation from the command line. The classifier is currently hardcoded. Pipe in an arff file.

Parameters:
args - currently ignored


Copyright © 2012 University of Waikato, Hamilton, NZ. All Rights Reserved.