Greedy forward selection algorithms to Sparse Gaussian Process Regression

This paper considers the basis vector selection issue invloved in forward selection algorithms to sparse Gaussian Process Regression (GPR). Firstly, we re-examine a previous basis vector selection criterion proposed by Smola and Bartlett [20], referred as loss-smola and give some new formulae to implement this criterion for the full-greedy strategy more efficiently in O(n2kmax) time instead of the original O(n2k2 max), where n is the number of training examples and kmax Lt n is the maximally allowed number of selected basis vectors. Secondly, in order to make the algorithm linearly scaling in n, which is quite preferable for large datasets, we present an approximate version loss-sun to loss-smola criterion. We compare the full greedy algorithms induced by the loss-sun and loss-smola criteria, respectively, on several medium-scale datasets. In contrast to loss-smola, the advantage associated with loss-sun criterion is that it could lead to an algorithm which scales as O(n2kmax) time and O(nkmax) memory if coupled with the sub-greedy scheme. Our criterion is similar to a matching pursuit approach, referred as loss-keert proposed very recently by Keerthi and Chu but with different motivations. Numerical experiments on a number of large-scale datasets have demonstrated that our proposed method is always better than loss-keert in both generalization performance and running time. Finally, we discuss the drawbacks of the sub-greedy strategy and present two approximate full-greedy strategies, which can be applied to all three basis vector selection criteria discussed in this paper.

[1]  Wei Chu,et al.  A matching pursuit approach to sparse Gaussian process regression , 2005, NIPS.

[2]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[3]  Larry S. Davis,et al.  Efficient Kernel Machines Using the Improved Fast Gauss Transform , 2004, NIPS.

[4]  Tong Zhang Approximation Bounds for Some Sparse Kernel Regression Algorithms , 2002, Neural Computation.

[5]  Gene H. Golub,et al.  Matrix computations , 1983 .

[6]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[7]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[8]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[9]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[10]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[11]  Andrew W. Moore,et al.  Efficient Locally Weighted Polynomial Regression Predictions , 1997, ICML.

[12]  Leslie Greengard,et al.  The Fast Gauss Transform , 1991, SIAM J. Sci. Comput..

[13]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[14]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[15]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[16]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[17]  D. Mackay,et al.  Introduction to Gaussian processes , 1998 .

[18]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[19]  Bernhard Schölkopf,et al.  An improved training algorithm for kernel Fisher discriminants , 2001, AISTATS.

[20]  Andrew Y. Ng,et al.  Fast Gaussian Process Regression using KD-Trees , 2005, NIPS.

[21]  Andy J. Keane,et al.  Some Greedy Learning Algorithms for Sparse Regression and Classification with Mercer Kernels , 2003, J. Mach. Learn. Res..

[22]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[23]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[24]  V. Raykar,et al.  Fast Computation of Sums of Gaussians in High Dimensions , 2005 .