Unsupervised Learning from Dyadic Data

Dyadic data refers to a domain with two nite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event co-occurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applications ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework for unsupervised learning from dyadic data by statistical mixture models. Our approach covers diier-ent models with at and hierarchical latent class structures and uniies probabilistic modeling and structure discovery. Mixture models provide both, a parsimonious yet exible parameterization of probability distributions with good generalization performance on sparse data, as well as structural information about data-inherent grouping structure. We propose an annealed version of the standard Expectation Maximization algorithm for model tting which is empirically evaluated on a variety of data sets from diierent domains.

[1]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[2]  C. Coombs A theory of data. , 1965, Psychology Review.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  H. Walker,et al.  An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions , 1978 .

[5]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[6]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[7]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985, Proceedings of the IEEE.

[8]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[9]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[10]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[11]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[12]  Donald Geman,et al.  Boundary Detection by Constrained Optimization , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  N. Wermuth,et al.  On Substantive Research Hypotheses, Conditional Independence Graphs and Graphical Chain Models , 1990 .

[14]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[15]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[16]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[19]  Anil K. Jain,et al.  Texture classification and segmentation using multiresolution simultaneous autoregressive models , 1992, Pattern Recognit..

[20]  Geoffrey C. Fox,et al.  Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[21]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[22]  Volker Steinbiss,et al.  Cooccurrence smoothing for stochastic language modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[24]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[25]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[26]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[27]  Ido Dagan,et al.  Similarity-Based Estimation of Word Cooccurrence Probabilities , 1994, ACL.

[28]  P. Pérez,et al.  Multiscale minimization of global energy functions in some visual recovery problems , 1994 .

[29]  Glenn Healey,et al.  Markov Random Field Models for Unsupervised Segmentation of Textured Color Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Klaus-Robert Müller,et al.  Annealed Competition of Experts for a Segmentation and Classification of Switching Dynamics , 1996, Neural Computation.

[31]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[32]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[33]  Eytan Domany,et al.  Data Clustering Using a Model Granular Magnet , 1997, Neural Computation.

[34]  Michael I. Jordan,et al.  Estimating Dependency Structure as a Hidden Variable , 1997, NIPS.

[35]  Ido Dagan,et al.  Similarity-Based Methods for Word Sense Disambiguation , 1997, ACL.

[36]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[37]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Erling B. Andersen,et al.  Introduction to the Statistical Analysis of Categorical Data , 1997 .

[39]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[40]  Eric Bauer,et al.  Update Rules for Parameter Estimation in Bayesian Networks , 1997, UAI.

[41]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[42]  Kenneth Rose,et al.  Deterministically annealed mixture of experts models for statistical regression , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43]  Joachim M. Buhmann,et al.  Empirical Risk Approximation: An Induction Principle for Unsupervised Learning , 1998 .

[44]  Joachim M. Buhmann,et al.  Unsupervised Texture Segmentation in a Deterministic Annealing Framework , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Joachim M. Buhmann,et al.  Multiscale annealing for real-time unsupervised texture segmentation , 1997, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[46]  Joachim M. Buhmann,et al.  Histogram clustering for unsupervised segmentation and image retrieval , 1999, Pattern Recognit. Lett..

[47]  Joachim M. Buhmann,et al.  A theory of proximity based clustering: structure detection by optimization , 2000, Pattern Recognit..