On Convergence Properties of the EM Algorithm for Gaussian Mixtures

We build up the mathematical connection between the Expectation-Maximization (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix P, and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of P and provide new results analyzing the effect that P has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of gaussian mixture models.

[1]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[2]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[5]  R. A. Boyles On the Convergence of the EM Algorithm , 1983 .

[6]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[7]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[8]  D. Titterington Recursive Parameter Estimation Using Incomplete Data , 1984 .

[9]  Steven J. Nowlan,et al.  Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .

[10]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[11]  A. Hero On the Convergence of the Em Algorithm , 1993, Proceedings. IEEE International Symposium on Information Theory.

[12]  Volker Tresp,et al.  Training Neural Networks with Deficient Data , 1993, NIPS.

[13]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[14]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[15]  Alan L. Yuille,et al.  Statistical Physics, Mixtures of Distributions, and the EM Algorithm , 1994, Neural Computation.

[16]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994 .

[17]  Steve R. Waterhouse,et al.  Classification using hierarchical mixtures of experts , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[18]  Bernie Mulgrew,et al.  IEEE Workshop on Neural Networks for Signal Processing , 1995 .

[19]  Shun-ichi Amari,et al.  Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.

[20]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[21]  Athanasios Kehagias,et al.  Time-Series Segmentation Using Predictive Modular Neural Networks , 1997, Neural Computation.

[22]  James R. Williamson,et al.  A Constructive, Incremental-Learning Network for Mixture Modeling and Classification , 1997, Neural Computation.

[23]  Lei Xu,et al.  Adaptive Rival Penalized Competitive Learning and Combined Linear Predictor Model for Financial Forecast and Investment , 1997, Int. J. Neural Syst..

[24]  Kenneth Rose,et al.  Mixture of experts regression modeling by deterministic annealing , 1997, IEEE Trans. Signal Process..

[25]  Sheng Chen,et al.  Robust maximum likelihood training of heteroscedastic probabilistic neural networks , 1998, Neural Networks.

[26]  Ethem Alpaydin,et al.  Soft vector quantization and the EM algorithm , 1998, Neural Networks.

[27]  A. W. E. E. K. L. Y. J. O U R N A L D E V O T E D T O T H E A D V A N C E,et al.  S C I E N C E , 2022 .