Statistical Models for Co-occurrence Data

Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.

[1]  Josef Bigün,et al.  Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement , 1995, Pattern Recognit..

[2]  Gerard Salton,et al.  Experiments in Automatic Thesaurus Construction for Information Retrieval , 1971, IFIP Congress.

[3]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[4]  P. Pérez,et al.  Multiscale minimization of global energy functions in some visual recovery problems , 1994 .

[5]  Donald Geman,et al.  Boundary Detection by Constrained Optimization , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[9]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[10]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[11]  Chee Sun Won,et al.  Unsupervised segmentation of noisy and textured images using Markov random fields , 1992, CVGIP Graph. Model. Image Process..

[12]  Ido Dagan,et al.  Similarity-Based Methods for Word Sense Disambiguation , 1997, ACL.

[13]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[14]  H. Walker,et al.  An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions , 1978 .

[15]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[16]  Geoffrey C. Fox,et al.  Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[17]  Matti Pietikäinen,et al.  Unsupervised Texture Segmentation Using Feature Distributions , 1997, ICIAP.

[18]  Joachim M. Buhmann,et al.  Vector quantization with complexity costs , 1993, IEEE Trans. Inf. Theory.

[19]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[20]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985, Proceedings of the IEEE.

[21]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[22]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[23]  David E. van den Bout,et al.  Graph partitioning using annealed neural networks , 1990, International 1989 Joint Conference on Neural Networks.

[24]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[25]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[26]  Bidyut Baran Chaudhuri,et al.  Texture Segmentation Using Fractal Dimension , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[28]  Kenneth Rose,et al.  Hierarchical, Unsupervised Learning with Growing via Phase Transitions , 1996, Neural Computation.

[29]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[30]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[31]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[32]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[33]  H. Walker,et al.  THE NUMERICAL EVALUATION OF THE MAXIMUM-LIKELIHOOD ESTIMATE OF A SUBSET OF MIXTURE PROPORTIONS* , 1978 .

[34]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[35]  Federico Girosi,et al.  Coupled Markov Random Fields and Mean Field Theory , 1989, NIPS.

[36]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[37]  Matti Pietikäinen,et al.  Unsupervised texture segmentation using feature distributions , 1997, Pattern Recognit..

[38]  Joachim M. Buhmann,et al.  Multiscale annealing for real-time unsupervised texture segmentation , 1997, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[39]  G. Bilbro,et al.  Mean-field approximation minimizes relative entropy , 1991 .

[40]  Michael I. Jordan Graphical Models , 1998 .

[41]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[43]  R. Peierls On a Minimum Property of the Free Energy , 1938 .

[44]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[45]  Eric Bauer,et al.  Update Rules for Parameter Estimation in Bayesian Networks , 1997, UAI.

[46]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[47]  Anil K. Jain,et al.  Texture classification and segmentation using multiresolution simultaneous autoregressive models , 1992, Pattern Recognit..

[48]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[49]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[50]  Glenn Healey,et al.  Markov Random Field Models for Unsupervised Segmentation of Textured Color Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[52]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[53]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[54]  Volker Steinbiss,et al.  Cooccurrence smoothing for stochastic language modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55]  Ido Dagan,et al.  Similarity-Based Estimation of Word Cooccurrence Probabilities , 1994, ACL.

[56]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[57]  Joachim M. Buhmann,et al.  Deterministic Annealing for Unsupervised Texture Segmentation , 1997, EMMCVPR.

[58]  Jian Fan,et al.  Frame representations for texture segmentation , 1996, IEEE Trans. Image Process..

[59]  Rama Chellappa,et al.  Unsupervised Texture Segmentation Using Markov Random Field Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  L MercerRobert,et al.  Class-based n-gram models of natural language , 1992 .

[61]  Peter Willett,et al.  Using interdocument similarity information in document retrieval systems , 1997 .

[62]  Yiming Yang,et al.  Using Corpus Statistics to Remove Redundant Words in Text Categorization , 1996, J. Am. Soc. Inf. Sci..

[63]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[64]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.