Kernelization of matrix updates, when and how?

Abstract We define what it means for a learning algorithm to be kernelizable in the case when the instances are vectors, asymmetric matrices and symmetric matrices, respectively. We can characterize kernelizability in terms of an invariance of the algorithm to certain orthogonal transformations. If we assume that the algorithm's action relies on a linear prediction, then we can show that in each case, the linear parameter vector must be a certain linear combination of the instances. We give a number of examples of how to apply our methods. In particular we show how to kernelize multiplicative updates for symmetric instance matrices.

[1]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[2]  A. Moore,et al.  Forecasting Web Page Views: Methods and Observations , 2008 .

[3]  Wojciech Kotlowski,et al.  Kernelization of Matrix Updates, When and How? , 2012, ALT.

[4]  W. Chen LINEAR FUNCTIONAL ANALYSIS , 2008 .

[5]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[6]  Francis R. Bach,et al.  A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[7]  Charles A. Micchelli,et al.  When is there a representer theorem? Vector versus matrix regularizers , 2008, J. Mach. Learn. Res..

[8]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[9]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[10]  Manfred K. Warmuth,et al.  Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2006, NIPS.

[11]  S. V. N. Vishwanathan,et al.  Leaving the Span , 2005, COLT.

[12]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[13]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[14]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[15]  Ambuj Tewari,et al.  On the Universality of Online Mirror Descent , 2011, NIPS.

[16]  V. Vovk Competitive On‐line Statistics , 2001 .

[17]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[18]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[19]  Huan Liu Feature Selection , 2010, Encyclopedia of Machine Learning.

[20]  Gunnar Rätsch,et al.  Prototype Classification: Insights from Machine Learning , 2009, Neural Computation.

[21]  Manfred K. Warmuth,et al.  Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension , 1995, Machine Learning.

[22]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[23]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[24]  Jürgen Forster,et al.  On Relative Loss Bounds in Generalized Linear Regression , 1999, FCT.

[25]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[26]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[27]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[28]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[29]  Robert H. Sloan,et al.  Proceedings of the 15th Annual Conference on Computational Learning Theory , 2002 .

[30]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[31]  Claudio Gentile,et al.  Linear Algorithms for Online Multitask Classification , 2010, COLT.

[32]  Manfred K. Warmuth,et al.  Online variance minimization , 2011, Machine Learning.

[33]  Manfred K. Warmuth,et al.  Online kernel PCA with entropic matrix updates , 2007, ICML '07.

[34]  Manfred K. Warmuth Winnowing subspaces , 2007, ICML '07.

[35]  Bernhard Schölkopf,et al.  The representer theorem for Hilbert spaces: a necessary and sufficient condition , 2012, NIPS.