Mixture models, in which a probability distribution is represented as a linear superposition of component distributions, are widely used in statistical modelling and pattern recognition. One of the key tasks in the application of mixture models is the determination of a suitable number of components. Conventional approaches based on cross-validation are computationally expensive, are wasteful of data, and give noisy estimates for the optimal number of components. A fully Bayesian treatment, based on Markov chain Monte Carlo methods for instance, will return a posterior distribution over the number of components. However, in practical applications it is generally convenient, or even computationally essential, to select a single, most appropriate model. Recently it has been shown, in the context of linear latent variable models, that the use of hierarchical priors governed by continuous hyper-parameters whose values are set by type-II maximum likelihood, can be used to optimize model complexity. In this paper we extend this framework to mixture distributions by considering the classical task of density estimation using mixtures of Gaussians. We show that, by setting the mixing coefficients to maximize the marginal log-likelihood, unwanted components can be suppressed, and the appropriate number of components for the mixture can be determined in a single training run without recourse to crossvalidation. Our approach uses a variational treatment based on a factorized approximation to the posterior distribution.
[1]
Christopher M. Bishop,et al.
Bayesian PCA
,
1998,
NIPS.
[2]
Christopher M. Bishop,et al.
Non-linear Bayesian Image Modelling
,
2000,
ECCV.
[3]
Charles M. Bishop.
Variational principal components
,
1999
.
[4]
David Mackay,et al.
Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks
,
1995
.
[5]
Michael I. Jordan,et al.
An Introduction to Variational Methods for Graphical Models
,
1999,
Machine Learning.
[6]
P. Green.
Reversible jump Markov chain Monte Carlo computation and Bayesian model determination
,
1995
.
[7]
Zoubin Ghahramani,et al.
Variational Inference for Bayesian Mixtures of Factor Analysers
,
1999,
NIPS.
[8]
Michael I. Jordan,et al.
Advances in Neural Information Processing Systems 30
,
1995
.
[9]
Anil K. Jain,et al.
Unsupervised selection and estimation of finite mixture models
,
2000,
Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.
[10]
D. M. Titterington,et al.
On the deter-mination of the number of components in a mixture
,
1998
.