Fast stochastic optimization on Riemannian manifolds

We study optimization of finite sums of \emph{geodesically} smooth functions on Riemannian manifolds. Although variance reduction techniques for optimizing finite-sum problems have witnessed a huge surge of interest in recent years, all existing work is limited to vector space problems. We introduce \emph{Riemannian SVRG}, a new variance reduced Riemannian optimization method. We analyze this method for both geodesically smooth \emph{convex} and \emph{nonconvex} functions. Our analysis reveals that Riemannian SVRG comes with advantages of the usual SVRG method, but with factors depending on manifold curvature that influence its convergence. To the best of our knowledge, ours is the first \emph{fast} stochastic Riemannian method. Moreover, our work offers the first non-asymptotic complexity analysis for nonconvex Riemannian optimization (even for the batch setting). Our results have several implications; for instance, they offer a Riemannian perspective on variance reduced PCA, which promises a short, transparent convergence analysis.

[1]  I. Holopainen Riemannian Geometry , 1927, Nature.

[2]  佐藤 保,et al.  Principal Components , 2021, Encyclopedic Dictionary of Archaeology.

[3]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[4]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[5]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[6]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[7]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[8]  Maher Moakher,et al.  Means and Averaging in the Group of Rotations , 2002, SIAM J. Matrix Anal. Appl..

[9]  Anuj Srivastava,et al.  Optimal linear representations of images for object recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[11]  H. Robbins A Stochastic Approximation Method , 1951 .

[12]  R. Bhatia Positive Definite Matrices , 2007 .

[13]  Ami Wiesel,et al.  Geodesic Convexity and Covariance Estimation , 2012, IEEE Transactions on Signal Processing.

[14]  Ben Jeuris,et al.  A survey and comparison of contemporary algorithms for computing the matrix geometric mean , 2012 .

[15]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[16]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[17]  Bart Vandereycken,et al.  Low-Rank Matrix Completion by Riemannian Optimization , 2013, SIAM J. Optim..

[18]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[19]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[20]  Ami Wiesel,et al.  Multivariate Generalized Gaussian Distribution: Convexity and Graphical Models , 2013, IEEE Transactions on Signal Processing.

[21]  Suvrit Sra,et al.  Geometric optimisation on positive definite matrices for elliptically contoured distributions , 2013, NIPS.

[22]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[23]  M. Congedo,et al.  Approximate Joint Diagonalization and Geometric Mean of Symmetric Positive Definite Matrices , 2015, PloS one.

[24]  Ivor W. Tsang,et al.  Riemannian Pursuit for Big Matrix Recovery , 2014, ICML.

[25]  Pinghua Gong,et al.  Linear Convergence of Variance-Reduced Stochastic Gradient without Strong Convexity , 2014 .

[26]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[27]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[28]  Sham M. Kakade,et al.  Robust Shift-and-Invert Preconditioning: Faster and More Sample Efficient Algorithms for Eigenvector Computation , 2015, ArXiv.

[29]  Suvrit Sra,et al.  Matrix Manifold Optimization for Gaussian Mixtures , 2015, NIPS.

[30]  Elad Hazan,et al.  Fast and Simple PCA via Convex Optimization , 2015, ArXiv.

[31]  Wen Huang,et al.  A Riemannian Limited-memory BFGS Algorithm for Computing the Matrix Geometric Mean , 2016, ICCS.

[32]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[33]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[34]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[35]  Hiroyuki Kasai,et al.  Riemannian stochastic variance reduced gradient on Grassmann manifold , 2016, ArXiv.

[36]  Peter Richtárik,et al.  Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..

[37]  Anoop Cherian,et al.  Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[39]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.