Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes

The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximised over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from i.i.d. observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the non-linear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain or partially missing inputs. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.

[1]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[2]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[5]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[6]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[7]  M. Gusarova,et al.  Nuclear Instruments and Methods in Physics Research , 2009 .

[8]  Carl E. Rasmussen,et al.  Gaussian Process Training with Input Noise , 2011, NIPS.

[9]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[10]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[11]  S. Chatterjee,et al.  Influential Observations, High Leverage Points, and Outliers in Linear Regression , 1986 .

[12]  Dieter Fox,et al.  GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, IROS.

[13]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[14]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[15]  Carl E. Rasmussen,et al.  Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM , 2013, ArXiv.

[16]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[17]  Hermann Ney,et al.  Combination of Tangent Vectors and Local Representations for Handwritten Digit Recognition , 2002, SSPR/SPR.

[18]  Miguel Lázaro-Gredilla,et al.  Bayesian Warped Gaussian Processes , 2012, NIPS.

[19]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[20]  Neil D. Lawrence,et al.  Semi-supervised Learning via Gaussian Processes , 2004, NIPS.

[21]  Dieter Fox,et al.  Learning GP-BayesFilters via Gaussian process latent variable models , 2009, Auton. Robots.

[22]  Agathe Girard,et al.  Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[23]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[24]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.

[25]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[26]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[27]  Xiaoou Tang,et al.  Surpassing Human-Level Face Verification Performance on LFW with GaussianFace , 2014, AAAI.

[28]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[29]  Carl E. Rasmussen,et al.  Robust Filtering and Smoothing with Gaussian Processes , 2012, IEEE Transactions on Automatic Control.

[30]  Miguel Lázaro-Gredilla,et al.  Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression , 2013, NIPS.

[31]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[32]  C. Rasmussen,et al.  Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting , 2002, NIPS.

[33]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[34]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[35]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[36]  Neil D. Lawrence,et al.  A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models , 2010, J. Mach. Learn. Res..

[37]  G. Uhlenbeck,et al.  On the Theory of the Brownian Motion , 1930 .

[38]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[39]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[40]  Neil D. Lawrence,et al.  WiFi-SLAM Using Gaussian Process Latent Variable Models , 2007, IJCAI.

[41]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[43]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[44]  C. Bishop,et al.  Analysis of multiphase flows using dual-energy gamma densitometry and neural networks , 1993 .

[45]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[46]  Neil D. Lawrence,et al.  Variational Gaussian Process Dynamical Systems , 2011, NIPS.

[47]  Wei Chu,et al.  Semi-Supervised Gaussian Process Classifiers , 2007, IJCAI.

[48]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[49]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[50]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[51]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[52]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[53]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[54]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[55]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[56]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[57]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[58]  Andreas C. Damianou,et al.  Deep Gaussian processes and variational propagation of uncertainty , 2015 .

[59]  Neil D. Lawrence,et al.  Detecting regulatory gene-environment interactions with unmeasured environmental factors , 2013, Bioinform..

[60]  Simo Srkk,et al.  Bayesian Filtering and Smoothing , 2013 .

[61]  Alexander Basilevsky,et al.  Statistical Factor Analysis and Related Methods , 1994 .

[62]  D. Bartholomew Latent Variable Models And Factor Analysis , 1987 .

[63]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Human Pose Estimation , 2007, MLMI.

[64]  Paul W. Goldberg,et al.  Regression with Input-dependent Noise: A Gaussian Process Treatment , 1997, NIPS.

[65]  A. O'Hagan,et al.  Bayesian inference for the uncertainty distribution of computer model outputs , 2002 .

[66]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[67]  Trevor Darrell,et al.  Discriminative Gaussian process latent variable model for classification , 2007, ICML '07.

[68]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[69]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[70]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[71]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[72]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[73]  Neil D. Lawrence,et al.  Fast Variational Inference for Gaussian Process Models Through KL-Correction , 2006, ECML.

[74]  Neil D. Lawrence,et al.  Manifold Relevance Determination , 2012, ICML.

[75]  Miguel Lázaro-Gredilla,et al.  Variational Heteroscedastic Gaussian Process Regression , 2011, ICML.

[76]  Wolfram Burgard,et al.  Most likely heteroscedastic Gaussian process regression , 2007, ICML '07.

[77]  Neil D. Lawrence,et al.  Gaussian Process Models with Parallelization and GPU acceleration , 2014, ArXiv.

[78]  D. Mackay,et al.  Introduction to Gaussian processes , 1998 .

[79]  Neil D. Lawrence,et al.  Learning for Larger Datasets with the Gaussian Process Latent Variable Model , 2007, AISTATS.

[80]  Simo Särkkä,et al.  Bayesian Filtering and Smoothing , 2013, Institute of Mathematical Statistics textbooks.

[81]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.