The kernel recursive least-squares algorithm

We present a nonlinear version of the recursive least squares (RLS) algorithm. Our algorithm performs linear regression in a high-dimensional feature space induced by a Mercer kernel and can therefore be used to recursively construct minimum mean-squared-error solutions to nonlinear least-squares problems that are frequently encountered in signal processing applications. In order to regularize solutions and keep the complexity of the algorithm bounded, we use a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be sufficiently well approximated by combining the images of previously admitted samples. This sparsification procedure allows the algorithm to operate online, often in real time. We analyze the behavior of the algorithm, compare its scaling properties to those of support vector machines, and demonstrate its utility in solving two signal processing problems-time-series prediction and channel equalization.

[1]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[2]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[3]  Edward J. Wegman,et al.  Statistical Signal Processing , 1985 .

[4]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[5]  G. Wahba Spline models for observational data , 1990 .

[6]  J. Friedman Multivariate adaptive regression splines , 1990 .

[7]  T. Kailath,et al.  A state-space approach to adaptive RLS filtering , 1994, IEEE Signal Processing Magazine.

[8]  A. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[9]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[10]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[11]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[12]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[13]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[14]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[15]  Steve Rogers,et al.  Adaptive Filter Theory , 1996 .

[16]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[17]  Gunnar Rätsch,et al.  Using support vector machines for time series prediction , 1999 .

[18]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[19]  Federico Girosi,et al.  Reducing the run-time complexity of Support Vector Machines , 1999 .

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[22]  S. Mallat A wavelet tour of signal processing , 1998 .

[23]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[24]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[25]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[26]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[27]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[30]  B. Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[31]  Michael E. Tipping Sparse Kernel Principal Component Analysis , 2000, NIPS.

[32]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[33]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[34]  James A. Bucklew,et al.  Support vector machine techniques for nonlinear equalization , 2000, IEEE Trans. Signal Process..

[35]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[36]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[37]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[38]  Tom Downs,et al.  Exact Simplification of Support Vector Solutions , 2002, J. Mach. Learn. Res..

[39]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[40]  Venkataramanan Balakrishnan,et al.  System identification: theory for the user (second edition): Lennart Ljung; Prentice-Hall, Englewood Cliffs, NJ, 1999, ISBN 0-13-656695-2 , 2002, Autom..

[41]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[42]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[43]  Shie Mannor,et al.  Sparse Online Greedy Support Vector Regression , 2002, ECML.

[44]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[45]  Christopher K. I. Williams Learning Kernel Classifiers , 2003 .

[46]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[47]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.