Using the Nyström Method to Speed Up Kernel Machines

A major problem for kernel-based predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n3), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix can be computed by the Nystrom method (which is used for the numerical solution of eigenproblems). This is achieved by carrying out an eigendecomposition on a smaller system of size m < n, and then expanding the results back up to n dimensions. The computational complexity of a predictor using this approximation is O(m2n). We report experiments on the USPS and abalone data sets and show that we can set m ≪ n without any significant decrease in the accuracy of the solution.

[1]  R. Taylor,et al.  The Numerical Treatment of Integral Equations , 1978 .

[2]  G. Wahba Spline models for observational data , 1990 .

[3]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[4]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[5]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.