Variational Auto-encoded Deep Gaussian Processes

We develop a scalable deep non-parametric generative model by augmenting deep Gaussian processes with a recognition model. Inference is performed in a novel scalable variational framework where the variational posterior distributions are reparametrized through a multilayer perceptron. The key aspect of this reformulation is that it prevents the proliferation of variational parameters which otherwise grow linearly in proportion to the sample size. We derive a new formulation of the variational lower bound that allows us to distribute most of the computation in a way that enables to handle datasets of the size of mainstream deep learning tasks. We show the efficacy of the method on a variety of challenges including deep unsupervised learning and deep Bayesian optimization.

[1]  Neil D. Lawrence,et al.  Nested Variational Compression in Deep Gaussian Processes , 2014, 1412.1370.

[2]  A. G. Sheard,et al.  Modelling creep rupture strength of ferritic steel welds , 2000 .

[3]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[4]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[5]  Michael A. Osborne Bayesian Gaussian processes for sequential prediction, optimisation and quadrature , 2010 .

[6]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[7]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[8]  Neil D. Lawrence,et al.  Ambiguity Modeling in Latent Spaces , 2008, MLMI.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Carl E. Rasmussen,et al.  Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).

[11]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[12]  David Ginsbourger,et al.  Additive Kernels for Gaussian Process Modeling , 2011, 1103.4023.

[13]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[14]  Neil D. Lawrence,et al.  Semi-described and semi-supervised learning with Gaussian processes , 2015, UAI.

[15]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[16]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[17]  Andreas C. Damianou,et al.  Deep Gaussian processes and variational propagation of uncertainty , 2015 .

[18]  Neil D. Lawrence,et al.  Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters , 2013, BMC Bioinformatics.

[19]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[20]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[21]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[22]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[23]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.

[24]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[25]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[26]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[27]  Miguel Lázaro-Gredilla,et al.  Bayesian Warped Gaussian Processes , 2012, NIPS.

[28]  Daniel Hern'andez-Lobato,et al.  Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation , 2018 .

[29]  Neil D. Lawrence,et al.  Gaussian Process Models with Parallelization and GPU acceleration , 2014, ArXiv.

[30]  Jasper Snoek,et al.  Nonparametric guidance of autoencoder representations using label information , 2012, J. Mach. Learn. Res..

[31]  Joaquin Quiñonero Candela,et al.  Local distance preservation in the GP-LVM through back constraints , 2006, ICML.

[32]  Ryan P. Adams,et al.  Avoiding pathologies in very deep networks , 2014, AISTATS.

[33]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[34]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[35]  Neil D. Lawrence,et al.  Variational Gaussian Process Dynamical Systems , 2011, NIPS.

[36]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[37]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[38]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[39]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[40]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .

[41]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[42]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[43]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.