Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel

Neural Networks (NNs) have been extensively used for a wide spectrum of real-world regression tasks, where the goal is to predict a numerical outcome such as revenue, effectiveness, or a quantitative result. In many such tasks, the point prediction is not enough: the uncertainty (i.e. risk or confidence) of that prediction must also be estimated. Standard NNs, which are most often used in such tasks, do not provide uncertainty information. Existing approaches address this issue by combining Bayesian models with NNs, but these models are hard to implement, more expensive to train, and usually do not predict as accurately as standard NNs. In this paper, a new framework (RIO) is developed that makes it possible to estimate uncertainty in any pretrained standard NN. The behavior of the NN is captured by modeling its prediction residuals with a Gaussian Process, whose kernel includes both the NN's input and its output. The framework is evaluated in twelve real-world datasets, where it is found to (1) provide reliable estimates of uncertainty, (2) reduce the error of the point predictions, and (3) scale well to large datasets. Given that RIO can be applied to any standard NN without modifications to model architecture or training pipeline, it provides an important ingredient for building real-world NN applications.

[1]  Peter Sollich,et al.  Learning Curves for Gaussian Processes , 1998, NIPS.

[2]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[3]  Luc Van Gool,et al.  Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks , 2018, International Journal of Computer Vision.

[4]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[5]  B. Silverman,et al.  Spline Smoothing: The Equivalent Variable Kernel Method , 1984 .

[6]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[7]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[8]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[10]  Emmanuel Abbe,et al.  Provable limitations of deep learning , 2018, ArXiv.

[11]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[12]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[13]  THE IMPORTANCE OF BEING UNCERTAIN , 2018 .

[14]  Jaehoon Lee,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[15]  Naomi S. Altman,et al.  Points of significance: Importance of being uncertain , 2013, Nature Methods.

[16]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[17]  Sören Bergmann,et al.  On the use of artificial neural networks in simulation-based manufacturing control , 2014, J. Simulation.

[18]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[19]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[20]  O. Anjos,et al.  Neural networks applied to discriminate botanical origin of honeys. , 2015, Food chemistry.

[21]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[22]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[23]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[24]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[25]  Nida Shahid,et al.  Applications of artificial neural networks in health care organizational decision-making: A scoping review , 2019, PloS one.

[26]  Manfred Opper,et al.  General Bounds on Bayes Errors for Regression with Gaussian Processes , 1998, NIPS.

[27]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[28]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[30]  Peter Sollich Gaussian Process Regression with Mismatched Models , 2001, NIPS.

[31]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[32]  Pi-Cheng Hsiu,et al.  SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation , 2018, IJCAI.

[33]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[34]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[35]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[36]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[37]  Tomoharu Iwata,et al.  Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes , 2017, 1707.05922.

[38]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[39]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[41]  Yee Whye Teh,et al.  Attentive Neural Processes , 2019, ICLR.

[42]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[43]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[44]  Carl E. Rasmussen,et al.  Evaluating Predictive Uncertainty Challenge , 2005, MLCW.

[45]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[46]  Seyed Taghi Akhavan Niaki,et al.  Forecasting S&P 500 index using artificial neural networks and design of experiments , 2013 .

[47]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.