Variational Inference for Uncertainty on the Inputs of Gaussian Process Models

The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximized over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximizing an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from iid observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the nonlinear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain inputs and semi-supervised Gaussian processes. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.

[1]  Paul W. Goldberg,et al.  Regression with Input-dependent Noise: A Gaussian Process Treatment , 1997, NIPS.

[2]  A. O'Hagan,et al.  Bayesian inference for the uncertainty distribution of computer model outputs , 2002 .

[3]  C. Rasmussen,et al.  Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting , 2002, NIPS.

[4]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[7]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[8]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[11]  Neil D. Lawrence,et al.  Variational Gaussian Process Dynamical Systems , 2011, NIPS.

[12]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[13]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[14]  Dieter Fox,et al.  GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, IROS.

[15]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[16]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[17]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[18]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[19]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[20]  Alexander Basilevsky,et al.  Statistical Factor Analysis and Related Methods , 1994 .

[21]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[22]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[23]  Agathe Girard,et al.  Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[24]  Miguel Lázaro-Gredilla,et al.  Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression , 2013, NIPS.

[25]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[26]  Neil D. Lawrence,et al.  Learning for Larger Datasets with the Gaussian Process Latent Variable Model , 2007, AISTATS.

[27]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[28]  Simo Särkkä,et al.  Bayesian Filtering and Smoothing , 2013, Institute of Mathematical Statistics textbooks.

[29]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[30]  Carl E. Rasmussen,et al.  Gaussian Process Training with Input Noise , 2011, NIPS.

[31]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[32]  Wolfram Burgard,et al.  Most likely heteroscedastic Gaussian process regression , 2007, ICML '07.

[33]  Iain Murray Introduction To Gaussian Processes , 2008 .

[34]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[35]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[36]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[37]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Human Pose Estimation , 2007, MLMI.

[38]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[39]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[40]  Trevor Darrell,et al.  Discriminative Gaussian process latent variable model for classification , 2007, ICML '07.

[41]  Neil D. Lawrence,et al.  Detecting regulatory gene-environment interactions with unmeasured environmental factors , 2013, Bioinform..

[42]  Simo Srkk,et al.  Bayesian Filtering and Smoothing , 2013 .

[43]  Neil D. Lawrence,et al.  A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models , 2010, J. Mach. Learn. Res..

[44]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[45]  C. Bishop,et al.  Analysis of multiphase flows using dual-energy gamma densitometry and neural networks , 1993 .

[46]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[47]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[48]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[49]  Neil D. Lawrence,et al.  Fast Variational Inference for Gaussian Process Models Through KL-Correction , 2006, ECML.

[50]  M. Gusarova,et al.  Nuclear Instruments and Methods in Physics Research , 2009 .

[51]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[52]  Miguel Lázaro-Gredilla,et al.  Bayesian Warped Gaussian Processes , 2012, NIPS.

[53]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.

[54]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[55]  Neil D. Lawrence,et al.  WiFi-SLAM Using Gaussian Process Latent Variable Models , 2007, IJCAI.

[56]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[57]  Dieter Fox,et al.  Learning GP-BayesFilters via Gaussian process latent variable models , 2009, Auton. Robots.

[58]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[59]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[60]  Xiaoou Tang,et al.  Surpassing Human-Level Face Verification Performance on LFW with GaussianFace , 2014, AAAI.

[61]  Carl E. Rasmussen,et al.  Robust Filtering and Smoothing with Gaussian Processes , 2012, IEEE Transactions on Automatic Control.

[62]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[63]  Neil D. Lawrence,et al.  Manifold Relevance Determination , 2012, ICML.

[64]  Miguel Lázaro-Gredilla,et al.  Variational Heteroscedastic Gaussian Process Regression , 2011, ICML.

[65]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[66]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[67]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[68]  Carl E. Rasmussen,et al.  Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM , 2013, ArXiv.