Batch and On-Line Parameter Estimation of Gaussian Mixtures Based on the Joint Entropy

We describe a new iterative method for parameter estimation of Gaussian mixtures. The new method is based on a framework developed by Kivinen and Warmuth for supervised on-line learning. In contrast to gradient descent and EM, which estimate the mixture's covariance matrices, the proposed method estimates the inverses of the covariance matrices. Furthennore, the new parameter estimation procedure can be applied in both on-line and batch settings. We show experimentally that it is typically faster than EM, and usually requires about half as many iterations as EM.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  H. Walker,et al.  An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions , 1978 .

[4]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[5]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[6]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  E. Sackinger,et al.  Neural-Network and k-Nearest-neighbor Classifiers , 1991 .

[9]  J. Q. Smith,et al.  1. Bayesian Statistics 4 , 1993 .

[10]  Anil K. Jain,et al.  Neural networks and pattern recognition , 1994 .

[11]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[12]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[13]  Manfred K. Warmuth,et al.  Worst-case Loss Bounds for Single Neurons , 1995, NIPS.

[14]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[15]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[16]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[17]  Eric Bauer,et al.  Update Rules for Parameter Estimation in Bayesian Networks , 1997, UAI.

[18]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[19]  Omid Omidvar,et al.  Neural Networks and Pattern Recognition , 1997 .

[20]  Yoram Singer,et al.  A Comparison of New and Old Algorithms for a Mixture Estimation Problem , 1995, COLT '95.