Efficient Regularized Least Squares Classification

Kernel-based regularized least squares (RLS) algorithms are a promising technique for classification. RLS minimizes a regularized functional directly in a reproducing kernel Hilbert space defined by a kernel. In contrast, support vector machines (SVMs) implement the structure risk minimization principle and use the kernel trick to extend it to the nonlinear case. While both have a sound mathematical foundation, RLS is strikingly simple. On the other hand, SVMs in general have a sparse representation of the solution. In this paper, we introduce a very fast version of the RLS algorithm while maintaining the achievable level of performance. The proposed new algorithm computes solutions in O(m) time and O(1) space, where m is the number of training points. We demonstrate the efficacy of our very fast RLS algorithm using a number of (both real simulated) data sets.

[1]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[2]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[5]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[6]  V. Ivanov,et al.  The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[7]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[8]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[9]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[10]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[13]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[14]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[15]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[16]  S. Winograd,et al.  On the asymptotic complexity of matrix multiplication , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[17]  V. A. Morozov,et al.  Methods for Solving Incorrectly Posed Problems , 1984 .

[18]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[19]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[20]  G. Stewart Introduction to matrix computations , 1973 .

[21]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[22]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[23]  V. Strassen Gaussian elimination is not optimal , 1969 .

[24]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[25]  Olivier Chapelle,et al.  Model Selection for Support Vector Machines , 1999, NIPS.

[27]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[28]  L. Galway Spline Models for Observational Data , 1991 .

[29]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[30]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[31]  Michael Kearns,et al.  A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split , 1995, Neural Computation.

[32]  Johan A. K. Suykens,et al.  Least squares support vector machines for classification and nonlinear modelling , 2000 .