Manifold Optimization for Gaussian Mixture Models

We take a new look at parameter estimation for Gaussian Mixture Models (GMMs). In particular, we propose using \emph{Riemannian manifold optimization} as a powerful counterpart to Expectation Maximization (EM). An out-of-the-box invocation of manifold optimization, however, fails spectacularly: it converges to the same solution but vastly slower. Driven by intuition from manifold convexity, we then propose a reparamerization that has remarkable empirical consequences. It makes manifold optimization not only match EM---a highly encouraging result in itself given the poor record nonlinear programming methods have had against EM so far---but also outperform EM in many practical settings, while displaying much less variability in running times. We further highlight the strengths of manifold optimization by developing a somewhat tuned manifold LBFGS method that proves even more competitive and reliable than existing manifold optimization tools. We hope that our results encourage a wider consideration of manifold optimization for parameter estimation problems.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[3]  Ben J. A. Kröse,et al.  Efficient Greedy Learning of Gaussian Mixture Models , 2003, Neural Computation.

[4]  Yair Weiss,et al.  "Natural Images, Gaussian Mixtures and Dead Leaves" , 2012, NIPS.

[5]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[6]  Jinwen Ma,et al.  Asymptotic Convergence Rate of the EM Algorithm for Gaussian Mixtures , 2000, Neural Computation.

[7]  R. Bhatia Positive Definite Matrices , 2007 .

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[10]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[11]  Suvrit Sra,et al.  Geometric optimisation on positive definite matrices for elliptically contoured distributions , 2013, NIPS.

[12]  Qingqing Huang,et al.  Learning Mixtures of Gaussians in High Dimensions , 2015, STOC.

[13]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[14]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[15]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[16]  P. Deb Finite Mixture Models , 2008 .

[17]  Ami Wiesel,et al.  Geodesic Convexity and Covariance Estimation , 2012, IEEE Transactions on Signal Processing.

[18]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[19]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[20]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[21]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  Benedikt Wirth,et al.  Optimization Methods on Riemannian Manifolds and Their Application to Shape Space , 2012, SIAM J. Optim..

[24]  M. Kendall Theoretical Statistics , 1956, Nature.

[25]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[26]  Bart Vandereycken,et al.  Low-Rank Matrix Completion by Riemannian Optimization , 2013, SIAM J. Optim..

[27]  R. Monteiro,et al.  Solving Semide nite Programs via Nonlinear Programming , 1999 .

[28]  Vanderbei Robert,et al.  On Formulating Semidefinite Programming Problems as Smooth Convex Nonlinear Optimization Problems , 2000 .

[29]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[30]  I. Moorhead,et al.  Natural Images , 2000, Perception.

[31]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[32]  David G. Stork,et al.  Pattern Classification , 1973 .

[33]  Francis R. Bach,et al.  Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..

[34]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[35]  R. Monteiro,et al.  Solving SemideÞnite Programs via Nonlinear Programming Part I: Transformations and Derivatives É , 1999 .

[36]  Daniel Gildea,et al.  Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients , 2012, ICML.

[37]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[38]  Suvrit Sra,et al.  Conic Geometric Optimization on the Manifold of Positive Definite Matrices , 2013, SIAM J. Optim..