Learning Eigenvectors for Free

We extend the classical problem of predicting a sequence of outcomes from a finite alphabet to the matrix domain. In this extension, the alphabet of n outcomes is replaced by the set of all dyads, i.e. outer products uuΤ where u is a vector in ℙn of unit length. Whereas in the classical case the goal is to learn (i.e. sequentially predict as well as) the best multinomial distribution, in the matrix case we desire to learn the density matrix that best explains the observed sequence of dyads. We show how popular online algorithms for learning a multinomial distribution can be extended to learn density matrices. Intuitively, learning the n2 parameters of a density matrix is much harder than learning the n parameters of a multinomial distribution. Completely surprisingly, we prove that the worst-case regrets of certain classical algorithms and their matrix generalizations are identical. The reason is that the worst-case sequence of dyads share a common eigensystem, i.e. the worst case regret is achieved in the classical case. So these matrix algorithms learn the eigenvectors without any regret.

[1]  Manfred K. Warmuth,et al.  Bayesian generalized probability calculus for density matrices , 2009, Machine Learning.

[2]  Sanjeev Arora,et al.  Fast algorithms for approximate semidefinite programming using the multiplicative weights update method , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[3]  Petri Myllymäki,et al.  A Fast Normalized Maximum Likelihood Algorithm for Multinomial Data , 2005, IJCAI.

[4]  T. Felbinger,et al.  Lossless quantum data compression and variable-length coding , 2001, quant-ph/0105026.

[5]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  Sanjeev Arora,et al.  A combinatorial, primal-dual approach to semidefinite programs , 2007, STOC '07.

[8]  Michael D. Westmoreland,et al.  Indeterminate-length quantum coding , 2000, quant-ph/0011014.

[9]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[10]  Manfred K. Warmuth When Is There a Free Matrix Lunch? , 2007, COLT.

[11]  Leonard J. Schulman Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010 , 2010, STOC.

[12]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[13]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[14]  Haitao Zhao,et al.  Incremental eigen decomposition , 2003 .

[15]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[16]  Manfred K. Warmuth,et al.  The Last-Step Minimax Algorithm , 2000, ALT.

[17]  Manfred K. Warmuth,et al.  Online variance minimization , 2011, Machine Learning.

[18]  Rahul Jain,et al.  QIP = PSPACE , 2011, JACM.

[19]  M. Sion On general minimax theorems , 1958 .