Probabilistic Meta-Representations Of Neural Networks

Existing Bayesian treatments of neural networks are typically characterized by weak prior and approximate posterior distributions according to which all the weights are drawn independently. Here, we consider a richer prior distribution in which units in the network are represented by latent variables, and the weights between units are drawn conditionally on the values of the collection of those variables. This allows rich correlations between related weights, and can be seen as realizing a function prior with a Bayesian complexity regularizer ensuring simple solutions. We illustrate the resulting meta-representations and representations, elucidating the power of this prior.

[1]  Margrit Betke,et al.  Hierarchical Bayesian Neural Networks for Personalized Classification , 2016 .

[2]  Kenneth O. Stanley,et al.  On the Performance of Indirect Encoding Across the Continuum of Regularity , 2011, IEEE Transactions on Evolutionary Computation.

[3]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[4]  Daan Wierstra,et al.  Stochastic Back-propagation and Variational Inference in Deep Latent Gaussian Models , 2014, ArXiv.

[5]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[6]  Yura N. Perov,et al.  Learning Probabilistic Programs , 2014, ArXiv.

[7]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[8]  Ben Glocker,et al.  Implicit Weight Uncertainty in Neural Networks. , 2017 .

[9]  M. West On scale mixtures of normal distributions , 1987 .

[10]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[11]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[12]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[13]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[14]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[15]  Sebastian Risi,et al.  An Enhanced Hypercube-Based Encoding for Evolving the Placement, Density, and Connectivity of Neurons , 2012, Artificial Life.

[16]  Kenneth O. Stanley,et al.  Compositional Pattern Producing Networks : A Novel Abstraction of Development , 2007 .

[17]  Noah D. Goodman,et al.  Learning a theory of causality. , 2011, Psychological review.

[18]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[19]  Neil D. Lawrence,et al.  The Emergence of Organizing Structure in Conceptual Representation , 2018, Cognitive science.

[20]  Diederik P. Kingma,et al.  Stochastic Gradient VB and the Variational Auto-Encoder , 2013 .

[21]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[22]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[23]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[24]  Michael I. Jordan,et al.  Learning Programs: A Hierarchical Bayesian Approach , 2010, ICML.

[25]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[26]  Miguel Lázaro-Gredilla,et al.  Local Expectation Gradients for Black Box Variational Inference , 2015, NIPS.

[27]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[28]  Kenneth O. Stanley,et al.  Autonomous Evolution of Topographic Regularities in Artificial Neural Networks , 2010, Neural Computation.

[29]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[30]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[31]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[32]  Max Welling,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS 2015.

[33]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[34]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[35]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[36]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.