论文信息 - Dropout as a Bayesian Approximation : Insights and Applications

Dropout as a Bayesian Approximation : Insights and Applications

Deep learning techniques are used more and more often, but they lack the ability to reason about uncertainty over the features. Features extracted from a dataset are given as point estimates, and do not capture how much the model is confident in its estimation. This is in contrast to probabilistic Bayesian models, which allow reasoning about model confidence, but often with the price of diminished performance. We show that a multilayer perceptron (MLP) with arbitrary depth and non-linearities, with dropout applied after every weight layer, is mathematically equivalent to an approximation to a well known Bayesian model. This interpretation offers an explanation to some of dropout’s key properties, such as its robustness to over-fitting. Our interpretation allows us to reason about uncertainty in deep learning, and allows the introduction of the Bayesian machinery into existing deep learning frameworks in a principled way. Our analysis suggests straightforward generalisations of dropout for future research which should improve on current techniques.

Yg | UK CAM.AC.

[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2] Kiyoshi Asai,et al. Marginalized kernels for biological sequences , 2002, ISMB.

[3] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[4] David J. Fleet,et al. Gaussian Process Dynamical Models , 2005, NIPS.

[5] Yoshua Bengio,et al. Neural Probabilistic Language Models , 2006 .

[6] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7] Neil D. Lawrence,et al. Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[8] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[9] Carl E. Rasmussen,et al. Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[10] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[11] Michael I. Jordan,et al. Variational Bayesian Inference with Stochastic Search , 2012, ICML.