Frank-Wolfe methods for geodesically convex optimization with application to the matrix geometric mean

We study projection-free methods for constrained Riemannian optimization. In particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under which RFW can attain a linear convergence rate. As a concrete example, we specialize Rfw to the manifold of positive definite matrices and apply it to two tasks: (i) computing the matrix geometric mean (Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter. Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian "linear oracle" required by RFW admits a closed-form solution; this result may be of independent interest. We further specialize RFW to the special orthogonal group and show that here too, the Riemannian "linear oracle" can be solved in closed form. Here, we describe an application to the synchronization of data matrices (Procrustes problem). We complement our theoretical results with an empirical comparison of Rfw against state-of-the-art Riemannian optimization methods and observe that RFW performs competitively on the task of computing Riemannian centroids.

[1]  Ivor W. Tsang,et al.  Riemannian Pursuit for Big Matrix Recovery , 2014, ICML.

[2]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[3]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[4]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[5]  Bruno Iannazzo,et al.  The Riemannian Barzilai–Borwein method with nonmonotone line search and the matrix geometric mean computation , 2018 .

[6]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[7]  Jefferson G. Melo,et al.  Iteration-Complexity of Gradient, Subgradient and Proximal Point Methods on Riemannian Manifolds , 2016, Journal of Optimization Theory and Applications.

[8]  Benedikt Wirth,et al.  Optimization Methods on Riemannian Manifolds and Their Application to Shape Space , 2012, SIAM J. Optim..

[9]  Bart Vandereycken,et al.  Low-Rank Matrix Completion by Riemannian Optimization , 2013, SIAM J. Optim..

[10]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[11]  Anoop Cherian,et al.  Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Francis R. Bach,et al.  Duality Between Subgradient and Conditional Gradient Methods , 2012, SIAM J. Optim..

[13]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[14]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[15]  P.-A. Absil,et al.  A Riemannian quasi-Newton method for computing the Karcher mean of symmetric positive definite matrices , 2017 .

[16]  Teng Zhang A Majorization-Minimization Algorithm for Computing the Karcher Mean of Positive Definite Matrices , 2017, SIAM J. Matrix Anal. Appl..

[17]  Rajendra Bhatia,et al.  Strong convexity of sandwiched entropies and related optimization problems , 2018, Reviews in Mathematical Physics.

[18]  R. Bhatia,et al.  On the Bures–Wasserstein distance between positive definite matrices , 2017, Expositiones Mathematicae.

[19]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[20]  Y. Lim,et al.  Matrix power means and the Karcher mean , 2012 .

[21]  Maher Moakher,et al.  Means and Averaging in the Group of Rotations , 2002, SIAM J. Matrix Anal. Appl..

[22]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[23]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[24]  Suvrit Sra,et al.  Nonconvex stochastic optimization on manifolds via Riemannian Frank-Wolfe methods , 2019, ArXiv.

[25]  Ami Wiesel,et al.  Multivariate Generalized Gaussian Distribution: Convexity and Graphical Models , 2013, IEEE Transactions on Signal Processing.

[26]  D. Le Bihan,et al.  Diffusion tensor imaging: Concepts and applications , 2001, Journal of magnetic resonance imaging : JMRI.

[27]  Jimmie D. Lawson,et al.  Karcher means and Karcher equations of positive definite operators , 2014 .

[28]  Wen Huang,et al.  A Riemannian Limited-memory BFGS Algorithm for Computing the Matrix Geometric Mean , 2016, ICCS.

[29]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[30]  Suvrit Sra,et al.  Conic Geometric Optimization on the Manifold of Positive Definite Matrices , 2013, SIAM J. Optim..

[31]  I. Chavel Riemannian Geometry: Subject Index , 2006 .

[32]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[33]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[34]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[35]  Suvrit Sra,et al.  Geometric optimisation on positive definite matrices for elliptically contoured distributions , 2013, NIPS.

[36]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[37]  Patrice Marcotte,et al.  Some comments on Wolfe's ‘away step’ , 1986, Math. Program..

[38]  Jan Vondrák,et al.  Maximizing a Submodular Set Function Subject to a Matroid Constraint (Extended Abstract) , 2007, IPCO.

[39]  R. Bhatia,et al.  Riemannian geometry and matrix geometric means , 2006 .

[40]  Ben Jeuris,et al.  A survey and comparison of contemporary algorithms for computing the matrix geometric mean , 2012 .

[41]  J. Jost Riemannian geometry and geometric analysis , 1995 .

[42]  John B. Moore,et al.  Essential Matrix Estimation Using Gauss-Newton Iterations on a Manifold , 2007, International Journal of Computer Vision.

[43]  Luigi Malagò,et al.  Wasserstein Riemannian Geometry of Positive Definite Matrices , 2018, 1801.09269.

[44]  Alexander J. Smola,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[45]  Suvrit Sra,et al.  Matrix Manifold Optimization for Gaussian Mixtures , 2015, NIPS.

[46]  T. Andô,et al.  Means of positive linear operators , 1980 .

[47]  S. Fujishige,et al.  A Submodular Function Minimization Algorithm Based on the Minimum-Norm Base ⁄ , 2009 .

[48]  Dario Bini,et al.  Computing the Karcher mean of symmetric positive definite matrices , 2013 .

[49]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[50]  R. Bhatia Positive Definite Matrices , 2007 .

[51]  P. Absil,et al.  Erratum to: ``Global rates of convergence for nonconvex optimization on manifolds'' , 2016, IMA Journal of Numerical Analysis.