Bending the Curve: Improving the ROC Curve Through Error Redistribution

Classification performance is often not uniform over the data. Some areas in the input space are easier to classify than others. Features that hold information about the "difficulty" of the data may be non-discriminative and are therefore disregarded in the classification process. We propose a meta-learning approach where performance may be improved by post-processing. This improvement is done by establishing a dynamic threshold on the base-classifier results. Since the base-classifier is treated as a "black box" the method presented can be used on any state of the art classifier in order to try an improve its performance. We focus our attention on how to better control the true-positive/false-positive trade-off known as the ROC curve. We propose an algorithm for the derivation of optimal thresholds by redistributing the error depending on features that hold information about difficulty. We demonstrate the resulting benefit on both synthetic and real-life data.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  F. Xavier Roca,et al.  Toward Real-Time Pedestrian Detection Based on a Deformable Template Model , 2014, IEEE Transactions on Intelligent Transportation Systems.

[5]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[6]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[7]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[9]  Alfred O. Hero,et al.  Sensor Management: Past, Present, and Future , 2011, IEEE Sensors Journal.

[10]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[11]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[12]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[13]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[14]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[15]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[16]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[17]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[18]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[19]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[20]  Singiresu S. Rao Engineering Optimization : Theory and Practice , 2010 .

[21]  Bhavani Raskutti,et al.  Optimising area under the ROC curve using gradient descent , 2004, ICML.

[22]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[23]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.