The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

[1]  Amtliches Mitteilungsblatt,et al.  August , 1890, The Hospital.

[2]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[5]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[6]  N. L. Johnson,et al.  Urn models and their application : an approach to modern discrete probability theory , 1978 .

[7]  Samuel Kotz,et al.  Urn Models and Their Applications: An Approach to Modern Discrete Probability Theory , 1978, The Mathematical Gazette.

[8]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[10]  D. Aldous Exchangeability and related topics , 1985 .

[11]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[12]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[13]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[14]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[15]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[16]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[17]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[18]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[19]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[20]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[21]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[22]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[23]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[24]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[25]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[26]  Michael I. Jordan Graphical Models , 1998 .

[27]  Prabhakar Raghavan,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, The VLDB Journal.

[28]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[29]  Andrew McCallum,et al.  A Machine Learning Approach to Building Domain-Specific Search Engines , 1999, IJCAI.

[30]  R. Durrett Essentials of Stochastic Processes , 1999 .

[31]  Jason D. M. Rennie,et al.  Building Domain-Speci c Search Engines with Machine Learning Techniques , 1999 .

[32]  Thomas Hofmann,et al.  The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data , 1999, IJCAI.

[33]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[34]  Andrew McCallum,et al.  Building Domain-Specific Search Engines with Machine Learning Techniques , 1999 .

[35]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[36]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[37]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[38]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[39]  Shivakumar Vaithyanathan,et al.  Model-Based Hierarchical Clustering , 2000, UAI.

[40]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[41]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[42]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[43]  Stuart J. Russell,et al.  Approximate inference for first-order probabilistic languages , 2001, IJCAI.

[44]  Mihaela Enachescu,et al.  Variations on Random Graph Models for the Web , 2001 .

[45]  S. Redner,et al.  Organization of growing random networks. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[47]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[48]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[49]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[50]  Benjamin M. Marlin,et al.  Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[51]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[52]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[53]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[54]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[55]  Marti A. Hearst,et al.  Nearly-Automated Metadata Hierarchy Creation , 2004, NAACL.

[56]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Thomas L. Griffiths,et al.  Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.

[58]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[59]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[60]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[61]  Andrew McCallum,et al.  The author-recipient-topic model for topic and role discovery in social networks , 2005 .

[62]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[63]  Stuart J. Russell,et al.  Approximate Inference for Infinite Contingent Bayesian Networks , 2005, AISTATS.

[64]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[65]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[66]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[67]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[68]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[69]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[70]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[71]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[72]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[73]  Max Welling,et al.  Accelerated Variational Dirichlet Process Mixtures , 2006, NIPS.

[74]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[75]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[76]  Jason A. Duan,et al.  Generalized spatial dirichlet process models , 2007 .

[77]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[78]  Roded Sharan,et al.  Bayesian Haplotype Inference via the Dirichlet Process , 2007, J. Comput. Biol..

[79]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[80]  Wei Li,et al.  Nonparametric Bayes Pachinko Allocation , 2007, UAI.

[81]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[82]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[83]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[84]  Andrew McCallum,et al.  Organizing the OCA: learning faceted subjects from a library of digital books , 2007, JCDL '07.

[85]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[86]  David Poole,et al.  Logical Generative Models for Probabilistic Reasoning about Existence, Roles and Identity , 2007, AAAI.

[87]  C. Elkan,et al.  Topic Models , 2008 .

[88]  A. Gelfand,et al.  The Nested Dirichlet Process , 2008 .

[89]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[90]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[91]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.