Online PCA with Optimal Regrets

We carefully investigate the online version of PCA, where in each trial a learning algorithm plays a k-dimensional subspace, and suffers the compression loss on the next instance when projected into the chosen subspace. In this setting, we give regret bounds for two popular online algorithms, Gradient Descent (GD) and Matrix Exponentiated Gradient (MEG). We show that both algorithms are essentially optimal in the worst-case when the regret is expressed as a function of the number of trials. This comes as a surprise, since MEG is commonly believed to perform sub-optimally when the instances are sparse. This different behavior of MEG for PCA is mainly related to the non-negativity of the loss in this case, which makes the PCA setting qualitatively different from other settings studied in the literature. Furthermore, we show that when considering regret bounds as a function of a loss budget, MEG remains optimal and strictly outperforms GD.

[1]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[2]  Nathan Srebro,et al.  Stochastic Optimization of PCA with Capped MSG , 2013, NIPS.

[3]  Gunnar Rätsch,et al.  Totally corrective boosting algorithms that maximize the margin , 2006, ICML.

[4]  Wouter M. Koolen,et al.  Follow the leader if you can, hedge if you must , 2013, J. Mach. Learn. Res..

[5]  Yoram Singer,et al.  A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[6]  Wouter M. Koolen,et al.  Adaptive Hedge , 2011, NIPS.

[7]  Manfred K. Warmuth,et al.  On-line Variance Minimization in O(n2) per Trial? , 2010, COLT.

[8]  Manfred K. Warmuth,et al.  Learning Permutations with Exponential Weights , 2007, COLT.

[9]  Tengyu Ma,et al.  Online Learning of Eigenvectors , 2015, ICML.

[10]  S. V. N. Vishwanathan,et al.  Leaving the Span , 2005, COLT.

[11]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[12]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[13]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[14]  Manfred K. Warmuth,et al.  Online kernel PCA with entropic matrix updates , 2007, ICML '07.

[15]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[16]  Peter L. Bartlett,et al.  A Stochastic View of Optimal Regret through Minimax Duality , 2009, COLT.

[17]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[18]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[19]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[20]  Ambuj Tewari,et al.  On the Universality of Online Mirror Descent , 2011, NIPS.

[21]  Manfred K. Warmuth,et al.  When Random Play is Optimal Against an Adversary , 2008, COLT.

[22]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[23]  Wouter M. Koolen Combining strategies efficiently: high-quality decisions from conflicting advice , 2011 .

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[25]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[26]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .