Detecting Misclassification Errors in Neural Networks with a Gaussian Process Model

As neural network classifiers are deployed in real-world applications, it is crucial that their predictions are not just accurate, but trustworthy as well. One practical solution is to assign confidence scores to each prediction, then filter out low-confidence predictions. However, existing confidence metrics are not yet sufficiently reliable for this role. This paper presents a new framework that produces more reliable confidence scores for detecting misclassification errors. This framework, RED, calibrates the classifier's inherent confidence indicators and estimates uncertainty of the calibrated confidence scores using Gaussian Processes. Empirical comparisons with other confidence estimation methods on 125 UCI datasets demonstrate that this approach is effective. An experiment on a vision task with a large deep learning architecture further confirms that the method can scale up, and a case study involving out-of-distribution and adversarial samples shows potential of the proposed method to improve robustness of neural network classifiers more broadly in the future.

[1]  N. A. Ralli,et al.  (Continued from previous page) , 1946 .

[2]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[3]  Bernard Dubuisson,et al.  A statistical decision rule with incomplete knowledge about classes , 1993, Pattern Recognit..

[4]  H. Jose Exploiting Multiple Existing Models and Learning Algorithms , 1995 .

[5]  Moshe Koppel Sean P. Engelson Integrating Multiple Classifiers By Finding Their Areas of Expertise , 1996 .

[6]  Steve Renals,et al.  Confidence measures for hybrid HMM/ANN speech recognition , 1997, EUROSPEECH.

[7]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[8]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[9]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[10]  Johannes Fürnkranz,et al.  An Evaluation of Grading Classifiers , 2001, IDA.

[11]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[12]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[13]  Carla M. Santos-Pereira,et al.  On optimal reject rules and ROC curves , 2005, Pattern Recognit. Lett..

[14]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[15]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[16]  Shankar M. Krishnan,et al.  Neural network classification of homomorphic segmented heart sounds , 2007, Appl. Soft Comput..

[17]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[18]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[19]  Tony R. Martinez,et al.  Using multiple measures to predict confidence in instance classification , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[20]  Ming Yuan,et al.  Classification Methods with Reject Option Based on Convex Risk Minimization , 2010, J. Mach. Learn. Res..

[21]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[22]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[23]  O. Anjos,et al.  Neural networks applied to discriminate botanical origin of honeys. , 2015, Food chemistry.

[24]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[26]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[27]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Peter A. Flach,et al.  Precision-Recall-Gain Curves: PR Analysis Done Right , 2015, NIPS.

[30]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[31]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[32]  Mehryar Mohri,et al.  Learning with Rejection , 2016, ALT.

[33]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[34]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[35]  Percy Liang,et al.  Unsupervised Risk Estimation Using Only Conditional Independence Structure , 2016, NIPS.

[36]  Diego Klabjan,et al.  Classification-Based Financial Markets Prediction Using Deep Neural Networks , 2016, Algorithmic Finance.

[37]  Ran El-Yaniv,et al.  Selective Classification for Deep Neural Networks , 2017, NIPS.

[38]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[39]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[40]  Daphna Weinshall,et al.  Distance-based Confidence Score for Neural Network Classifiers , 2017, ArXiv.

[41]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[42]  Luc Van Gool,et al.  Failure Prediction for Autonomous Driving , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[43]  Vladimir A. Maksimenko,et al.  Artificial Neural Network Classification of Motor-Related EEG: An Increase in Classification Accuracy by Reducing Signal Complexity , 2018, Complex..

[44]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[45]  Graham W. Taylor,et al.  Learning Confidence for Out-of-Distribution Detection in Neural Networks , 2018, ArXiv.

[46]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[47]  Dustin Tran,et al.  Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches , 2018, ICLR.

[48]  Samy S. Abu-Naser,et al.  Email Classification Using Artificial Neural Network , 2018 .

[49]  Maya R. Gupta,et al.  To Trust Or Not To Trust A Classifier , 2018, NeurIPS.

[50]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[51]  Jingyi Wang,et al.  Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[52]  Matthieu Cord,et al.  Addressing Failure Prediction by Learning Model Confidence , 2019, NeurIPS.

[53]  Nida Shahid,et al.  Applications of artificial neural networks in health care organizational decision-making: A scoping review , 2019, PloS one.

[54]  Peter A. Flach,et al.  Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration , 2019, NeurIPS.

[55]  Marcin Detyniecki,et al.  Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection , 2019, ArXiv.

[56]  Roland Siegwart,et al.  This is not what I imagined: Error Detection for Semantic Segmentation through Visual Dissimilarity , 2019, ArXiv.

[57]  Dustin Tran,et al.  Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[58]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[59]  Elliot Meyerson,et al.  Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel , 2019, ICLR.

[60]  Karthick Thiyagarajan,et al.  Hyper-Parameter Initialization for Squared Exponential Kernel-based Gaussian Process Regression , 2020, 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA).

[61]  Tomas Pfister,et al.  Distance-Based Learning from Errors for Confidence Calibration , 2020, ICLR.

[62]  Risto Miikkulainen,et al.  Discovering Parametric Activation Functions , 2020, Neural Networks.