论文信息 - Repairing Faulty Mixture Models using Density Estimation

Repairing Faulty Mixture Models using Density Estimation

Previous work in mixture model clustering has focused primarily on the issue of model selection. Model scoring functions (including penalized likelihood and Bayesian approximations) can guide a search of the model parameter and structure space. Relatively little research has addressed the issue of how to move through this space. Local optimization techniques, such as expectation maxi-mization, solve only part of the problem; we still need to move between diierent local op-tima. The traditional approach, restarting the search from diierent random conngura-tions, is ineecient. We describe a more directed and controlled way of moving between local maxima. Using multi-resolution kd-trees for fast density estimation, we search by modifying models within regions where they fail to predict the datapoint density. We compare this algorithm with a canonical clustering method, nding favorable results on a variety of large, low-dimensional datasets.

Andrew W. Moore | Peter Sand

[1] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[2] Paul S. Bradley,et al. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions , 1999, KDD '99.

[3] David Heckerman,et al. Stratiied Exponential Families: Graphical Models and Model Selection , 1998 .

[4] Tian Zhang,et al. Fast density estimation using CF-kernel for very large databases , 1999, KDD '99.

[5] Christopher K. I. Williams. A MCMC Approach to Hierarchical Mixture Modelling , 1999, NIPS.

[6] Paul S. Bradley,et al. Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[7] Sanjoy Dasgupta,et al. Experiments with Random Projection , 2000, UAI.

[8] J. Simonoff. Multivariate Density Estimation , 1996 .

[9] Peter C. Cheeseman,et al. Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[10] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11] David L. Dowe,et al. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..