A Bayesian Review of the Poisson-Dirichlet Process

The two parameter Poisson-Dirichlet process is also known as the PitmanYor Process and related to the Chinese Restaurant Process, is a generalisation of the Dirichlet Process, and is increasingly being used for probabilistic modelling in discrete areas such as language and images. This article reviews the theory of the Poisson-Dirichlet process in terms of its consistency for estimation, the convergence rates and the posteriors of data. This theory has been well developed for continuous distributions (more generally referred to as nonatomic distributions). This article then presents a Bayesian interpretation of the Poisson-Dirichlet process: it is a mixture using an improper and infinite dimensional Dirichlet distribution. This interpretation requires technicalities of priors, posteriors and Hilbert spaces, but conceptually, this means we can understand the process as just another Dirichlet and thus all its sampling properties fit naturally. Finally, this article also presents results for the discrete case which is the case seeing widespread use now in computer science, but which has received less attention in the literature.

[1]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[2]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[3]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[4]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[5]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[6]  J. Pitman Some developments of the Blackwell-MacQueen urn scheme , 1996 .

[7]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[8]  L. C. Hsu,et al.  A Unified Approach to Generalized Stirling Numbers , 1998 .

[9]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[10]  J. Pitman Brownian Motion, Bridge, Excursion, and Meander Characterized by Sampling at Independent Uniform Times , 1999 .

[11]  Hajime Yamato,et al.  MOMENTS OF SOME STATISTICS OF PITMAN SAMPLING FORMULA , 2000 .

[12]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[13]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[14]  Lancelot F. James,et al.  Generalized weighted Chinese restaurant processes for species sampling mixture models , 2003 .

[15]  Thomas L. Griffiths,et al.  Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.

[16]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[17]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[18]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[19]  Yee Whye Teh,et al.  A Bayesian Interpretation of Interpolated Kneser-Ney , 2006 .

[20]  Eiichiro Sumita,et al.  The Infinite Markov Model , 2007, NIPS.

[21]  J. K. Hunter,et al.  Measure Theory , 2007 .

[22]  Yee Whye Teh,et al.  Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[23]  Andrew McCallum,et al.  Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors , 2008 .

[24]  Lancelot F. James Large sample asymptotics for the two-parameter Poisson–Dirichlet process , 2007, 0708.4294.

[25]  Brendan J. Frey,et al.  Flexible Priors for Exemplar-based Clustering , 2008, UAI.

[26]  Michael I. Jordan,et al.  Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes , 2008, NIPS.

[27]  Lancelot F. James,et al.  Posterior Analysis for Normalized Random Measures with Independent Increments , 2009 .

[28]  Yee Whye Teh,et al.  A stochastic memoizer for sequence data , 2009, ICML '09.

[29]  Lan Du,et al.  Sampling Table Configurations for the Hierarchical Poisson-Dirichlet Process , 2011, ECML/PKDD.