Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the k-means and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis.

[1]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[2]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[3]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[4]  Manfred K. Warmuth,et al.  Relative Expected Instantaneous Loss Bounds , 2000, J. Comput. Syst. Sci..

[5]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[9]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Max Welling,et al.  Bayesian k-Means as a Maximization-Expectation Algorithm , 2009, Neural Computation.

[12]  Hal Daumé,et al.  A geometric view of conjugate priors , 2010, Machine Learning.

[13]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[14]  Chong Wang,et al.  Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.

[15]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[16]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .