Winnowing subspaces

We generalize the Winnow algorithm for learning disjunctions to learning subspaces of low rank. Subspaces are represented by symmetric projection matrices. The online algorithm maintains its uncertainty about the hidden low rank projection matrix as a symmetric positive definite matrix. This matrix is updated using a version of the Matrix Exponentiated Gradient algorithm that is based on matrix exponentials and matrix logarithms. As in the case of the Winnow algorithm, the bounds are logarithmic in the dimension n of the problem, but linear in the rank r of the hidden subspace. We show that the algorithm can be adapted to handle arbitrary matrices of any dimension via a reduction.

[1]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[2]  Manfred K. Warmuth,et al.  Online kernel PCA with entropic matrix updates , 2007, ICML '07.

[3]  Manfred K. Warmuth When Is There a Free Matrix Lunch? , 2007, COLT.

[4]  Sanjeev Arora,et al.  A combinatorial, primal-dual approach to semidefinite programs , 2007, STOC '07.

[5]  Manfred K. Warmuth,et al.  Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2006, NIPS.

[6]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[7]  Claudio Gentile,et al.  Improved Risk Tail Bounds for On-Line Algorithms , 2005, IEEE Transactions on Information Theory.

[8]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[9]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[10]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[11]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[12]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[13]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[14]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[15]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[16]  Sally Floyd,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 2004, Machine Learning.