Geometric Optimization in Machine Learning

Machine learning models often rely on sparsity, low-rank, orthogonality, correlation, or graphical structure. The structure of interest in this chapter is geometric, specifically the manifold of positive definite (PD) matrices. Though these matrices recur throughout the applied sciences, our focus is on more recent developments in machine learning and optimization. In particular, we study (i) models that might be nonconvex in the Euclidean sense but are convex along the PD manifold; and (ii) ones that are neither Euclidean nor geodesic convex but are nevertheless amenable to global optimization. We cover basic theory for (i) and (ii); subsequently, we present a scalable Riemannian limited-memory BFGS algorithm (that also applies to other manifolds). We highlight some applications from statistics and machine learning that benefit from the geometric structure studies.

[1]  Y. Lim,et al.  Invariant metrics, contractions and nonlinear matrix equations , 2008 .

[2]  S. Sra,et al.  Jensen-Bregman LogDet Divergence for Efficient Similarity Computations on Positive Definite Tensors , 2012 .

[3]  Yair Weiss,et al.  "Natural Images, Gaussian Mixtures and Dead Leaves" , 2012, NIPS.

[4]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[5]  Ami Wiesel,et al.  Multivariate Generalized Gaussian Distribution: Convexity and Graphical Models , 2013, IEEE Transactions on Signal Processing.

[6]  Matthias Bethge,et al.  Data modeling with the elliptical gamma distribution , 2015, AISTATS.

[7]  Ping Li,et al.  A new space for comparing graphs , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[8]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[9]  Anoop Cherian,et al.  Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[10]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[11]  Maher Moakher,et al.  A Differential Geometric Approach to the Geometric Mean of Symmetric Positive-Definite Matrices , 2005, SIAM J. Matrix Anal. Appl..

[12]  Bamdev Mishra,et al.  A Riemannian approach to large-scale constrained least-squares with symmetries , 2014 .

[13]  Jinwen Ma,et al.  Asymptotic Convergence Rate of the EM Algorithm for Gaussian Mixtures , 2000, Neural Computation.

[14]  Babak Nadjar Araabi,et al.  Mixture of ICAs model for natural images solved by manifold optimization method , 2015, 2015 7th Conference on Information and Knowledge Technology (IKT).

[15]  H. Vincent Poor,et al.  Complex Elliptically Symmetric Distributions: Survey, New Results and Applications , 2012, IEEE Transactions on Signal Processing.

[16]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[17]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[18]  Suvrit Sra,et al.  Geometric optimisation on positive definite matrices for elliptically contoured distributions , 2013, NIPS.

[19]  R. Bhatia Positive Definite Matrices , 2007 .

[20]  Florian Yger,et al.  A review of kernels on covariance matrices for BCI applications , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[21]  Ami Wiesel,et al.  Unified Framework to Regularized Covariance Estimation in Scaled Gaussian Models , 2012, IEEE Transactions on Signal Processing.

[22]  Suvrit Sra,et al.  Conic Geometric Optimization on the Manifold of Positive Definite Matrices , 2013, SIAM J. Optim..

[23]  Sebastian Ehrlichmann,et al.  Metric Spaces Of Non Positive Curvature , 2016 .

[24]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[25]  Marc Arnaudon,et al.  Riemannian Medians and Means With Applications to Radar Signal Processing , 2013, IEEE Journal of Selected Topics in Signal Processing.

[26]  Ben Jeuris,et al.  A survey and comparison of contemporary algorithms for computing the matrix geometric mean , 2012 .

[27]  Bart Vandereycken Riemannian and Multilevel Optimization for Rank-Constrained Matrix Problems (with Applications to Lyapunov Equations) (Riemannse en meerschalige optimalisatie voor matrixproblemen met rangbeperkingen) , 2010 .

[28]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[29]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[30]  Suvrit Sra,et al.  Diversity Networks , 2015, ICLR.

[31]  D. Le Bihan,et al.  Diffusion tensor imaging: Concepts and applications , 2001, Journal of magnetic resonance imaging : JMRI.

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  Rajendra Bhatia,et al.  The matrix geometric mean , 2011 .

[34]  Lei Wang,et al.  Learning Discriminative Stein Kernel for SPD Matrices and Its Applications , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Benedikt Wirth,et al.  Optimization Methods on Riemannian Manifolds and Their Application to Shape Space , 2012, SIAM J. Optim..

[36]  Reshad Hosseini,et al.  MixEst: An Estimation Toolbox for Mixture Models , 2015, ArXiv.

[37]  Y. Peres,et al.  Determinantal Processes and Independence , 2005, math/0503110.

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  S. Sra Positive definite matrices and the S-divergence , 2011, 1110.1773.

[40]  Pierre Vandergheynst,et al.  ShapeNet: Convolutional Neural Networks on Non-Euclidean Manifolds , 2015, ArXiv.

[41]  Renato D. C. Monteiro,et al.  Solving Semidefinite Programs via Nonlinear Programming, Part II: Interior Point Methods for a Subclass of SDPs , 1999 .

[42]  David E. Tyler,et al.  Redescending $M$-Estimates of Multivariate Location and Scatter , 1991 .

[43]  Maher Moakher,et al.  Means of Hermitian positive-definite matrices based on the log-determinant α-divergence function , 2012 .

[44]  M. Bacák Convex Analysis and Optimization in Hadamard Spaces , 2014 .

[45]  Ami Wiesel,et al.  Geodesic Convexity and Covariance Estimation , 2012, IEEE Transactions on Signal Processing.

[46]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[47]  Bas Lemmens,et al.  Nonlinear Perron-Frobenius Theory , 2012 .

[48]  S. Sra On the matrix square root via geometric optimization , 2015, 1507.08366.

[49]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[50]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[51]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[52]  Suvrit Sra,et al.  Fixed-point algorithms for learning determinantal point processes , 2015, ICML.

[53]  Teng Zhang Robust subspace recovery by geodesically convex optimization , 2012 .

[54]  Suvrit Sra,et al.  Matrix Manifold Optimization for Gaussian Mixtures , 2015, NIPS.

[55]  Ben J. A. Kröse,et al.  Efficient Greedy Learning of Gaussian Mixture Models , 2003, Neural Computation.

[56]  Wen Huang,et al.  A Broyden Class of Quasi-Newton Methods for Riemannian Optimization , 2015, SIAM J. Optim..

[57]  R. Monteiro,et al.  Solving SemideÞnite Programs via Nonlinear Programming Part I: Transformations and Derivatives É , 1999 .

[58]  R. Monteiro,et al.  Solving Semide nite Programs via Nonlinear Programming , 1999 .

[59]  Vanderbei Robert,et al.  On Formulating Semidefinite Programming Problems as Smooth Convex Nonlinear Optimization Problems , 2000 .

[60]  Frank Nielsen,et al.  Matrix Information Geometry , 2012 .

[61]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[62]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[63]  Anoop Cherian,et al.  Positive Definite Matrices : Data Representation and Applications to Computer Vision , 2015 .

[64]  Nicolas Boumal,et al.  Optimization and estimation on manifolds , 2014 .

[65]  Anoop Cherian,et al.  Riemannian Sparse Coding for Positive Definite Matrices , 2014, ECCV.

[66]  Dario Bini,et al.  Computing the Karcher mean of symmetric positive definite matrices , 2013 .

[67]  Bruno Iannazzo,et al.  Geometric means of structured matrices , 2014 .

[68]  Y. Lim,et al.  Matrix power means and the Karcher mean , 2012 .