SMEM Algorithm for Mixture Models

We present a split-and-merge expectation-maximization (SMEM) algorithm to overcome the local maxima problem in parameter estimation of finite mixture models. In the case of mixture models, local maxima often involve having too many components of a mixture model in one part of the space and too few in another, widely separated part of the space. To escape from such configurations, we repeatedly perform simultaneous split-and-merge operations using a new criterion for efficiently selecting the split-and-merge candidates. We apply the proposed algorithm to the training of gaussian mixtures and mixtures of factor analyzers using synthetic and real data and show the effectiveness of using the split- and-merge operations to improve the likelihood of both the training data and of held-out test data. We also show the practical usefulness of the proposed algorithm by applying it to image compression and pattern recognition problems.

[1]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[2]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[3]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[6]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .

[7]  C. O'Connor An introduction to multivariate statistical analysis: 2nd edn. by T. W. Anderson. 675 pp. Wiley, New York (1984) , 1987 .

[8]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[9]  Kenichiro Ishii Design of a recognition dictionary using artificially distorted characters , 1990, Systems and Computers in Japan.

[10]  Naonori Ueda,et al.  A new competitive learning approach based on an equidistortion principle for designing optimal vector quantizers , 1994, Neural Networks.

[11]  Volker Tresp,et al.  Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging , 1995, NIPS.

[12]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[13]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[14]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[15]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[16]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[17]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[18]  Yasuo Ariki,et al.  Automatic classification of TV sports news video by multiple subspace method , 2000, Systems and Computers in Japan.

[19]  Michael I. Jordan,et al.  Mixtures of Probabilistic Principal Component Analyzers , 2001 .