Repairing Faulty Mixture Models using Density Estimation

Previous work in mixture model clustering has focused primarily on the issue of model selection. Model scoring functions (including penalized likelihood and Bayesian approximations) can guide a search of the model parameter and structure space. Relatively little research has addressed the issue of how to move through this space. Local optimization techniques, such as expectation maxi-mization, solve only part of the problem; we still need to move between diierent local op-tima. The traditional approach, restarting the search from diierent random conngura-tions, is ineecient. We describe a more directed and controlled way of moving between local maxima. Using multi-resolution kd-trees for fast density estimation, we search by modifying models within regions where they fail to predict the datapoint density. We compare this algorithm with a canonical clustering method, nding favorable results on a variety of large, low-dimensional datasets.

[1]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[2]  Paul S. Bradley,et al.  Compressed data cubes for OLAP aggregate query approximation on continuous dimensions , 1999, KDD '99.

[3]  David Heckerman,et al.  Stratiied Exponential Families: Graphical Models and Model Selection , 1998 .

[4]  Tian Zhang,et al.  Fast density estimation using CF-kernel for very large databases , 1999, KDD '99.

[5]  Christopher K. I. Williams A MCMC Approach to Hierarchical Mixture Modelling , 1999, NIPS.

[6]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[7]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[8]  J. Simonoff Multivariate Density Estimation , 1996 .

[9]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[12]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[13]  Chris Fraley,et al.  Algorithms for Model-Based Gaussian Hierarchical Clustering , 1998, SIAM J. Sci. Comput..

[14]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[15]  John S. Baras,et al.  Consistent estimation of the order of Hidden Markov Chains , 1992 .

[16]  Marina Meila,et al.  An Experimental Comparison of Several Clustering and Initialization Methods , 1998, UAI.

[17]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[18]  A. Raftery,et al.  Algorithms for Model-based Gaussian Hierarchical Clustering 1 , 1996 .

[19]  Luis Talavera,et al.  Feature Selection as a Preprocessing Step for Hierarchical Clustering , 1999, ICML.

[20]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[21]  Chris Fraley,et al.  MCLUST: Software for Model-Based Cluster and Discriminant Analysis , 1998 .

[22]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[23]  Andrew W. Moore,et al.  The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data , 2000, UAI.

[24]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[25]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[26]  C. Posse Hierarchical Model-Based Clustering for Large Datasets , 2001 .

[27]  R. Jennrich,et al.  Acceleration of the EM Algorithm by using Quasi‐Newton Methods , 1997 .