A Gradient-Based Forward Greedy Algorithm for Space Gaussian Process Regression

In this chaper, we present a gradient-based forward greedy method for sparse approximation of Bayesian Gaussian Process Regression (GPR) model. Different from previous work, which is mostly based on various basis vector selection strategies, we propose to construct instead of select a new basis vector at each iterative step. This idea was motivated from the well-known gradient boosting approach. The resulting algorithm built on gradient-based optimisation packages incurs similar computational cost and memory requirements to other leading sparse GPR algorithms. Moreover, the proposed work is a general framework which can be extended to deal with other popular kernel machines, including Kernel Logistic Regression (KLR) and Support Vector Machines (SVMs). Numerical experiments on a wide range of datasets are presented to demonstrate the superiority of our algorithm in terms of generalisation performance.

[1]  D. Mackay,et al.  Introduction to Gaussian processes , 1998 .

[2]  Ian T. Nabney,et al.  Nonlinear Prediction of Quantitative StructureActivity Relationships. , 2004 .

[3]  Larry S. Davis,et al.  Efficient Kernel Machines Using the Improved Fast Gauss Transform , 2004, NIPS.

[4]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[5]  Andy J. Keane,et al.  Some Greedy Learning Algorithms for Sparse Regression and Classification with Mercer Kernels , 2003, J. Mach. Learn. Res..

[6]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[7]  Gene H. Golub,et al.  Matrix computations , 1983 .

[8]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[9]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[10]  Bernhard Schölkopf,et al.  An improved training algorithm for kernel Fisher discriminants , 2001, AISTATS.

[11]  I. Nabney,et al.  Non-linear Prediction of Quantitative Structure – Activity Relationships , 2004 .

[12]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[13]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[14]  V. Raykar,et al.  Fast Computation of Sums of Gaussians in High Dimensions , 2005 .

[15]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[16]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[17]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[18]  Andrew Y. Ng,et al.  Fast Gaussian Process Regression using KD-Trees , 2005, NIPS.

[19]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[20]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[21]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[22]  Peter Tiño,et al.  Nonlinear Prediction of Quantitative Structure-Activity Relationships , 2004, J. Chem. Inf. Model..

[23]  Alexander G. Gray Fast kernel matrix-vector multiplication with application to Gaussian process learning , 2004 .

[24]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[25]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[26]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[27]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[28]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[29]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[30]  Tong Zhang Approximation Bounds for Some Sparse Kernel Regression Algorithms , 2002, Neural Computation.

[31]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[32]  Wei Chu,et al.  A matching pursuit approach to sparse Gaussian process regression , 2005, NIPS.

[33]  G. Baudat,et al.  Kernel-based methods and function approximation , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[34]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[35]  Xin Yao,et al.  Greedy forward selection algorithms to Sparse Gaussian Process Regression , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[36]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[37]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[38]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[39]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[40]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[41]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[42]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.