Online PCA with Optimal Regret

We investigate the online version of Principle Component Analysis (PCA), where in each trial t the learning algorithm chooses a k-dimensional subspace, and upon receiving the next instance vector xt, suffers the "compression loss", which is the squared Euclidean distance between this instance and its projection into the chosen subspace. When viewed in the right parameterization, this compression loss is linear, i.e. it can be rewritten as tr(Wtxtxt⊤), where Wt is the parameter of the algorithm and the outer product xtxt⊤ (with ||xt|| ≤ 1) is the instance matrix. In this paper generalize PCA to arbitrary positive definite instance matrices Xt with the linear loss tr(WtXt). We evaluate online algorithms in terms of their worst-case regret, which is a bound on the additional total loss of the online algorithm on all instances matrices over the compression loss of the best k-dimensional subspace (chosen in hindsight). We focus on two popular online algorithms for generalized PCA: the Gradient Descent (GD) and Matrix Exponentiated Gradient (MEG) algorithms. We show that if the regret is expressed as a function of the number of trials, then both algorithms are optimal to within a constant factor on worst-case sequences of positive definite instances matrices with trace norm at most one (which subsumes the original PCA problem with outer products). This is surprising because MEG is believed be suboptimal in this case. We also show that when considering regret bounds as a function of a loss budget, then MEG remains optimal and strictly outperforms GD when the instance matrices are trace norm bounded. Next, we consider online PCA when the adversary is allowed to present the algorithm with positive semidefinite instance matrices whose largest eigenvalue is bounded (rather than their trace which is the sum of their eigenvalues). Again we can show that MEG is optimal and strictly better than GD in this setting.

[1]  Wouter M. Koolen,et al.  Follow the leader if you can, hedge if you must , 2013, J. Mach. Learn. Res..

[2]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[3]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[4]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[5]  Ambuj Tewari,et al.  On the Universality of Online Mirror Descent , 2011, NIPS.

[6]  Wojciech Kotlowski,et al.  PCA with Gaussian perturbations , 2015, ArXiv.

[7]  Yoram Singer,et al.  A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[8]  Tengyu Ma,et al.  Online Learning of Eigenvectors , 2015, ICML.

[9]  S. V. N. Vishwanathan,et al.  Leaving the Span , 2005, COLT.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[12]  Wouter M. Koolen,et al.  Adaptive Hedge , 2011, NIPS.

[13]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[14]  Peter L. Bartlett,et al.  A Stochastic View of Optimal Regret through Minimax Duality , 2009, COLT.

[15]  Jiazhong Nie,et al.  Online PCA with Optimal Regrets , 2013, ALT.

[16]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[17]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[18]  Wouter M. Koolen Combining strategies efficiently: high-quality decisions from conflicting advice , 2011 .

[19]  Nathan Srebro,et al.  Stochastic Optimization of PCA with Capped MSG , 2013, NIPS.

[20]  K WarmuthManfred,et al.  Online PCA with optimal regret , 2016 .

[21]  Manfred K. Warmuth,et al.  On-line Variance Minimization in O(n2) per Trial? , 2010, COLT.

[22]  Manfred K. Warmuth,et al.  Learning Permutations with Exponential Weights , 2007, COLT.

[23]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[24]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[25]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[26]  Manfred K. Warmuth,et al.  When Random Play is Optimal Against an Adversary , 2008, COLT.