Nonconvex stochastic optimization on manifolds via Riemannian Frank-Wolfe methods

We study stochastic projection-free methods for constrained optimization of smooth functions on Riemannian manifolds, i.e., with additional constraints beyond the parameter domain being a manifold. Specifically, we introduce stochastic Riemannian Frank-Wolfe methods for nonconvex and geodesically convex problems. We present algorithms for both purely stochastic optimization and finite-sum problems. For the latter, we develop variance-reduced methods, including a Riemannian adaptation of the recently proposed Spider technique. For all settings, we recover convergence rates that are comparable to the best-known rates for their Euclidean counterparts. Finally, we discuss applications to two classic tasks: The computation of the Karcher mean of positive definite matrices and Wasserstein barycenters for multivariate normal distributions. For both tasks, stochastic Fw methods yield state-of-the-art empirical performance.

[1]  Christopher De Sa,et al.  Representation Tradeoffs for Hyperbolic Embeddings , 2018, ICML.

[2]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[3]  Pierre-Antoine Absil,et al.  A Collection of Nonsmooth Riemannian Optimization Problems , 2019, Nonsmooth Optimization and Its Applications.

[4]  Nicolas Boumal,et al.  Simple Algorithms for Optimization on Riemannian Manifolds with Constraints , 2019, Applied Mathematics & Optimization.

[5]  Bamdev Mishra,et al.  Riemannian Stochastic Recursive Gradient Algorithm , 2018, ICML 2018.

[6]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[7]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[8]  Suvrit Sra,et al.  Fixed-point algorithms for learning determinantal point processes , 2015, ICML.

[9]  Hongyi Zhang,et al.  R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate , 2018, ArXiv.

[10]  T. Andô Concavity of certain maps on positive definite matrices and applications to Hadamard products , 1979 .

[11]  Neil D. Lawrence,et al.  Efficient inference in matrix-variate Gaussian models with \iid observation noise , 2011, NIPS.

[12]  Teng Zhang A Majorization-Minimization Algorithm for Computing the Karcher Mean of Positive Definite Matrices , 2017, SIAM J. Matrix Anal. Appl..

[13]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[14]  Wen Huang,et al.  A Riemannian BFGS Method Without Differentiated Retraction for Nonconvex Optimization Problems , 2018, SIAM J. Optim..

[15]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[16]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[17]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[18]  Bart Vandereycken,et al.  Low-Rank Matrix Completion by Riemannian Optimization , 2013, SIAM J. Optim..

[19]  Melanie Weber,et al.  Neighborhood Growth Determines Geometric Priors for Relational Representation Learning , 2019, AISTATS.

[20]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[21]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[22]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[23]  Francis R. Bach,et al.  Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..

[24]  Louis J. Billera,et al.  Geometry of the Space of Phylogenetic Trees , 2001, Adv. Appl. Math..

[25]  Hiroyuki Kasai,et al.  Riemannian stochastic variance reduced gradient , 2016, SIAM J. Optim..

[26]  Alexander J. Smola,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[28]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[29]  J. Jost Riemannian geometry and geometric analysis , 1995 .

[30]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[31]  R. Bhatia Positive Definite Matrices , 2007 .