Regularized Least-Squares Classification 133 In practice , although

We consider the solution of binary classification problems via Tikhonov regularization in a Reproducing Kernel Hilbert Space using the square loss, and denote the resulting algorithm Regularized Least-Squares Classification (RLSC). We sketch the historical developments that led to this algorithm, and demonstrate empirically that its performance is equivalent to that of the well-known Support Vector Machine on several datasets. Whereas training an SVM requires solving a convex quadratic program, training RLSC requires only the solution of a single system of linear equations. We discuss the computational tradeoffs between RLSC and SVM, and explore the use of approximations to RLSC in situations where the full RLSC is too expensive. We also develop an elegant leaveone-out bound for RLSC that exploits the geometry of the algorithm, making a connection to recent work in algorithmic stability.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  I. J. Schoenberg,et al.  SPLINE FUNCTIONS AND THE PROBLEM OF GRADUATION , 1964 .

[3]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[4]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[5]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[6]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[7]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[8]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[9]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[10]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[13]  G. Wahba Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized Gacv 1 1 Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized Gacv , 1998 .

[14]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[17]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[18]  Massimiliano Pontil,et al.  Face Detection in Still Gray Images , 2000 .

[19]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[20]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[21]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[22]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[23]  Mariano Alvira,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No.XXXX C.B.C.L Paper No.XXX An Empirical Comparison of SNoW and SVMs For Face Detection , 2001 .

[24]  Jason D. M. Rennie,et al.  Improving Multiclass Text Classification with the Support Vector Machine , 2001 .

[25]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[26]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[27]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[28]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[29]  J. Suykens,et al.  Bayesian inference for LS-SVMs on large data sets using the Nystrom method , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[30]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[31]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[32]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .