Sparse Gaussian Processes using Pseudo-inputs

We present a new Gaussian process (GP) regression model whose co-variance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M2N) training cost and O(M2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.

[1]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[2]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[3]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[4]  Christopher K. I. Williams,et al.  Discovering Hidden Features with Gaussian Processes Regression , 1998, NIPS.

[5]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[6]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[7]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[8]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[9]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[10]  Carl Edward Rasmussen,et al.  Observations on the Nyström Method for Gaussian Process Prediction , 2002 .

[11]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[12]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[13]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[14]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[15]  Joaquin Quiñonero-Candela,et al.  Learning with Uncertainty: Gaussian Processes and Relevance Vector Machines , 2004 .