A New Parameter Estimation Method for Gaussian Mixtures

We describe a new iterative method for parameter estimation of Gaussian mixtures. The new method is based on a framework developed by Kivinen and Warmuth for supervised online learning. In contrast to gradient descent and EM, which estimate the mixture’s covariance matrices, the proposed method estimates the inverses of the covariance matrices. Furthermore, the new parameter estimation procedure can be applied in both on-line and batch settings. We show experimentally that it is typically faster than EM, and usually requires about half as many iterations as EM. We also describe experiments with digit recognition that demonstrate the merits of the on-line version when the source generating the data is non-stationary.

[1]  Eric Bauer,et al.  Update Rules for Parameter Estimation in Bayesian Networks , 1997, UAI.

[2]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[3]  Yoram Singer,et al.  Training Algorithms for Hidden Markov Models using Entropy Based Distance Functions , 1996, NIPS.

[4]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[5]  Gene H. Golub,et al.  Matrix computations , 1983 .

[6]  E. Sackinger,et al.  Neural-Network and k-Nearest-neighbor Classifiers , 1991 .

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[11]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[12]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[13]  Anil K. Jain,et al.  Neural networks and pattern recognition , 1994 .

[14]  Manfred K. Warmuth,et al.  Worst-case Loss Bounds for Single Neurons , 1995, NIPS.

[15]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[16]  Yoram Singer,et al.  A Comparison of New and Old Algorithms for a Mixture Estimation Problem , 1995, COLT '95.

[17]  A. Dvoretzky On Stochastic Approximation , 1956 .

[18]  H. Walker,et al.  An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions , 1978 .