PCA with Gaussian perturbations

Most of machine learning deals with vector parameters. Ideally we would like to take higher order information into account and make use of matrix or even tensor parameters. However the resulting algorithms are usually inefficient. Here we address on-line learning with matrix parameters. It is often easy to obtain online algorithm with good generalization performance if you eigendecompose the current parameter matrix in each trial (at a cost of $O(n^3)$ per trial). Ideally we want to avoid the decompositions and spend $O(n^2)$ per trial, i.e. linear time in the size of the matrix data. There is a core trade-off between the running time and the generalization performance, here measured by the regret of the on-line algorithm (total gain of the best off-line predictor minus the total gain of the on-line algorithm). We focus on the key matrix problem of rank $k$ Principal Component Analysis in $\mathbb{R}^n$ where $k \ll n$. There are $O(n^3)$ algorithms that achieve the optimum regret but require eigendecompositions. We develop a simple algorithm that needs $O(kn^2)$ per trial whose regret is off by a small factor of $O(n^{1/4})$. The algorithm is based on the Follow the Perturbed Leader paradigm. It replaces full eigendecompositions at each trial by the problem finding $k$ principal components of the current covariance matrix that is perturbed by Gaussian noise.

[1]  W. Arnoldi The principle of minimized iterations in the solution of the matrix eigenvalue problem , 1951 .

[2]  J. Cullum,et al.  Lanczos algorithms for large symmetric eigenvalue computations , 1985 .

[3]  Peter Deuflhard,et al.  Numerische Mathematik. I , 2002 .

[4]  J. Cullum,et al.  Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1 , 2002 .

[5]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[6]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[7]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[8]  Sanjeev Arora,et al.  A combinatorial, primal-dual approach to semidefinite programs , 2007, STOC.

[9]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[10]  Manfred K. Warmuth,et al.  On-line Variance Minimization in O(n2) per Trial? , 2010, COLT.

[11]  Manfred K. Warmuth,et al.  Online variance minimization , 2011, Machine Learning.

[12]  Nathan Srebro,et al.  Stochastic optimization for PCA and PLS , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[13]  T. Tao Topics in Random Matrix Theory , 2012 .

[14]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[15]  Jiazhong Nie,et al.  Online PCA with Optimal Regrets , 2013, ALT.

[16]  Sanjoy Dasgupta,et al.  The Fast Convergence of Incremental PCA , 2013, NIPS.

[17]  Luc Devroye,et al.  Prediction by random-walk perturbation , 2013, COLT.

[18]  Nathan Srebro,et al.  Stochastic Optimization of PCA with Capped MSG , 2013, NIPS.

[19]  Wojciech Kotlowski,et al.  Follow the Leader with Dropout Perturbations , 2014, COLT.

[20]  Tengyu Ma,et al.  Online Learning of Eigenvectors , 2015, ICML.

[21]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[22]  Christos Boutsidis,et al.  Online Principal Components Analysis , 2015, SODA.