Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel

Neural Networks (NNs) have been extensively used for a wide spectrum of real-world regression tasks, where the goal is to predict a numerical outcome such as revenue, effectiveness, or a quantitative result. In many such tasks, the point prediction is not enough: the uncertainty (i.e. risk or confidence) of that prediction must also be estimated. Standard NNs, which are most often used in such tasks, do not provide uncertainty information. Existing approaches address this issue by combining Bayesian models with NNs, but these models are hard to implement, more expensive to train, and usually do not predict as accurately as standard NNs. In this paper, a new framework (RIO) is developed that makes it possible to estimate uncertainty in any pretrained standard NN. The behavior of the NN is captured by modeling its prediction residuals with a Gaussian Process, whose kernel includes both the NN's input and its output. The framework is evaluated in twelve real-world datasets, where it is found to (1) provide reliable estimates of uncertainty, (2) reduce the error of the point predictions, and (3) scale well to large datasets. Given that RIO can be applied to any standard NN without modifications to model architecture or training pipeline, it provides an important ingredient for building real-world NN applications.

[1]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Tomoharu Iwata,et al.  Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes , 2017, 1707.05922.

[3]  Luc Van Gool,et al.  Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks , 2018, International Journal of Computer Vision.

[4]  Peter Sollich,et al.  Learning Curves for Gaussian Processes , 1998, NIPS.

[5]  Jaehoon Lee,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[6]  Yee Whye Teh,et al.  Attentive Neural Processes , 2019, ICLR.

[7]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[8]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[9]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[10]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[11]  Seyed Taghi Akhavan Niaki,et al.  Forecasting S&P 500 index using artificial neural networks and design of experiments , 2013 .

[12]  Emmanuel Abbe,et al.  Provable limitations of deep learning , 2018, ArXiv.

[13]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[14]  Sören Bergmann,et al.  On the use of artificial neural networks in simulation-based manufacturing control , 2014, J. Simulation.

[15]  Nida Shahid,et al.  Applications of artificial neural networks in health care organizational decision-making: A scoping review , 2019, PloS one.

[16]  Manfred Opper,et al.  General Bounds on Bayes Errors for Regression with Gaussian Processes , 1998, NIPS.

[17]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[18]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[19]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[20]  Naomi S. Altman,et al.  Points of significance: Importance of being uncertain , 2013, Nature Methods.

[21]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[22]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[23]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[24]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[25]  Carl E. Rasmussen,et al.  Evaluating Predictive Uncertainty Challenge , 2005, MLCW.

[26]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[27]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[28]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[29]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[30]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[31]  Pi-Cheng Hsiu,et al.  SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation , 2018, IJCAI.

[32]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[33]  Peter Sollich Gaussian Process Regression with Mismatched Models , 2001, NIPS.

[34]  O. Anjos,et al.  Neural networks applied to discriminate botanical origin of honeys. , 2015, Food chemistry.

[35]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[36]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  B. Silverman,et al.  Spline Smoothing: The Equivalent Variable Kernel Method , 1984 .

[39]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[40]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[41]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[42]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[43]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[44]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[45]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[46]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .