Probabilistic models of text and images

Managing large and growing collections of information is a central goal of modern computer science. Data repositories of texts, images, sounds, and genetic information have become widely accessible, thus necessitating good methods of retrieval, organization, and exploration. In this thesis, we describe a suite of probabilistic models of information collections for which the above problems can be cast as statistical queries. We use directed graphical models as a flexible, modular framework for describing appropriate modeling assumptions about the data. Fast approximate posterior inference algorithms based on variational methods free us from having to specify tractable models, and further allow us to take the Bayesian perspective, even in the face of large datasets. With this framework in hand, we describe latent Dirichlet allocation (LDA), a graphical model particularly suited to analyzing text collections. LDA posits a finite index of hidden topics which describe the underlying documents. New documents are situated into the collection via approximate posterior inference of their associated index terms. Extensions to LDA can index a set of images, or multimedia collections of interrelated text and images. Finally, we describe nonparametric Bayesian methods for relaxing the assumption of a fixed number of topics, and develop models based on the natural assumption that the size of the index can grow with the collection. This idea is extended to trees, and to models which represent the hidden structure and content of a topic hierarchy that underlies a collection.

[1]  Joseph B. Kadane,et al.  Bayesian Methods for Censored Categorical Data , 1987 .

[2]  Alan E. Gelfand,et al.  A Computational Approach for Full Nonparametric Bayesian Inference Under Dirichlet Process Mixture Models , 2002 .

[3]  Anne Lohrli Chapman and Hall , 1985 .

[4]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[6]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[7]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[8]  R. Kass,et al.  Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models) , 1989 .

[9]  Daphne Koller,et al.  Probabilistic Abstraction Hierarchies , 2001, NIPS.

[10]  Donna K. Harman,et al.  Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.

[11]  Abby Goodrum,et al.  Image Information Retrieval: An Overview of Current Research , 2000, Informing Sci. Int. J. an Emerg. Transdiscipl..

[12]  David J. Spiegelhalter,et al.  VIBES: A Variational Inference Engine for Bayesian Networks , 2002, NIPS.

[13]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[14]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[15]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[16]  Michael I. Jordan,et al.  A generalized mean field algorithm for variational inference in exponential families , 2002, UAI.

[17]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[18]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[19]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[20]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[21]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[22]  Ata Kabán,et al.  On an equivalence between PLSI and LDA , 2003, SIGIR.

[23]  Marina Meila,et al.  An Experimental Comparison of Several Clustering and Initialization Methods , 1998, UAI.

[24]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[25]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[28]  D. Aldous Exchangeability and related topics , 1985 .

[29]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[30]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[31]  Lawrence D. Brown Fundamentals of Statistical Exponential Families , 1987 .

[32]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[33]  Dennis V. Lindley,et al.  Empirical Bayes Methods , 1974 .

[34]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[35]  Yves Chiaramella,et al.  A Model for Multimedia Information Retrieval , 1996 .

[36]  J. Dickey Multiple Hypergeometric Functions: Probabilistic Interpretations and Statistical Uses , 1983 .

[37]  Andrew McCallum,et al.  Learning with Scope, with Application to Information Extraction and Classification , 2002, UAI.

[38]  Milind R. Naphade,et al.  A probabilistic framework for semantic indexing and retrieval in video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[39]  Hilbert J. Kappen,et al.  General Lower Bounds based on Computer Generated Higher Order Expansions , 2012, UAI.

[40]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[41]  B. M. Hill,et al.  Theory of Probability , 1990 .

[42]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[43]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[44]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[45]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[46]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[48]  Jason D. M. Rennie Improving multi-class text classification with Naive Bayes , 2001 .

[49]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[50]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[51]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[52]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[54]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[55]  Robert J. Connor,et al.  Concepts of Independence for Proportions with a Generalization of the Dirichlet Distribution , 1969 .

[56]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[57]  Elena A. Erosheva,et al.  Grade of membership and latent structure models with application to disability survey data , 2002 .

[58]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[59]  David Heckerman,et al.  Probabilistic Models for Relational Data , 2004 .

[60]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[61]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[62]  Adrian E. Raftery,et al.  [Practical Markov Chain Monte Carlo]: Comment: One Long Run with Diagnostics: Implementation Strategies for Markov Chain Monte Carlo , 1992 .

[63]  Ata Kabán,et al.  Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles , 2003, NIPS.

[64]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[65]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[66]  Thomas A. Louis,et al.  Empirical Bayes Methods , 2006 .

[67]  Nando de Freitas,et al.  "Name That Song!" A Probabilistic Approach to Querying on Music and Text , 2002, NIPS.

[68]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[69]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[70]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[71]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[72]  Thomas Hofmann,et al.  The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data , 1999, IJCAI.

[73]  Benjamin M. Marlin,et al.  Collaborative Filtering: A Machine Learning Perspective , 2004 .

[74]  Kenneth G. Manton,et al.  Dirichlet Generalizations of Latent-Class Models , 2000, J. Classif..

[75]  Aleks Jakulin,et al.  Applying Discrete PCA in Data Analysis , 2004, UAI.

[76]  David M. Pennock,et al.  Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments , 2001, UAI.

[77]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[78]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[79]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[80]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.