Matrix Manifold Optimization for Gaussian Mixtures

We take a new look at parameter estimation for Gaussian Mixture Model (GMMs). Specifically, we advance Riemannian manifold optimization (on the manifold of positive definite matrices) as a potential replacement for Expectation Maximization (EM), which has been the de facto standard for decades. An out-of-the-box invocation of Riemannian optimization, however, fails spectacularly: it obtains the same solution as EM, but vastly slower. Building on intuition from geometric convexity, we propose a simple reformulation that has remarkable consequences: it makes Riemannian optimization not only match EM (a nontrivial result on its own, given the poor record nonlinear programming has had against EM), but also outperforms it in many settings. To bring our ideas to fruition, we develop a well-tuned Riemannian LBFGS method that proves superior to known competing methods (e.g., Riemannian conjugate gradient). We hope that our results encourage a wider consideration of manifold optimization in machine learning and statistics.

[1]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[2]  R. Keener Theoretical Statistics: Topics for a Core Course , 2010 .

[3]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[4]  Ami Wiesel,et al.  Geodesic Convexity and Covariance Estimation , 2012, IEEE Transactions on Signal Processing.

[5]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[6]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[7]  Vanderbei Robert,et al.  On Formulating Semidefinite Programming Problems as Smooth Convex Nonlinear Optimization Problems , 2000 .

[8]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[9]  Suvrit Sra,et al.  Manifold Optimization for Gaussian Mixture Models , 2015, ArXiv.

[10]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[11]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[12]  Suvrit Sra,et al.  Geometric optimisation on positive definite matrices for elliptically contoured distributions , 2013, NIPS.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Benedikt Wirth,et al.  Optimization Methods on Riemannian Manifolds and Their Application to Shape Space , 2012, SIAM J. Optim..

[15]  Jinwen Ma,et al.  Asymptotic Convergence Rate of the EM Algorithm for Gaussian Mixtures , 2000, Neural Computation.

[16]  I. Moorhead,et al.  Natural Images , 2000, Perception.

[17]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[18]  Bart Vandereycken,et al.  Low-Rank Matrix Completion by Riemannian Optimization , 2013, SIAM J. Optim..

[19]  Ben J. A. Kröse,et al.  Efficient Greedy Learning of Gaussian Mixture Models , 2003, Neural Computation.

[20]  David G. Stork,et al.  Pattern Classification , 1973 .

[21]  Francis R. Bach,et al.  Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..

[22]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[23]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[24]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[25]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[26]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[27]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[28]  Qingqing Huang,et al.  Learning Mixtures of Gaussians in High Dimensions , 2015, STOC.

[29]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[30]  Daniel Gildea,et al.  Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients , 2012, ICML.

[31]  Suvrit Sra,et al.  Conic Geometric Optimization on the Manifold of Positive Definite Matrices , 2013, SIAM J. Optim..

[32]  Reshad Hosseini,et al.  MixEst: An Estimation Toolbox for Mixture Models , 2015, ArXiv.

[33]  Renato D. C. Monteiro,et al.  Solving Semidefinite Programs via Nonlinear Programming, Part II: Interior Point Methods for a Subclass of SDPs , 1999 .

[34]  R. Monteiro,et al.  Solving SemideÞnite Programs via Nonlinear Programming Part I: Transformations and Derivatives É , 1999 .

[35]  Yair Weiss,et al.  "Natural Images, Gaussian Mixtures and Dead Leaves" , 2012, NIPS.

[36]  R. Bhatia Positive Definite Matrices , 2007 .

[37]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[38]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).