Modeling Semantic Similarities in Multiple Maps

Models that represent words as points in a semantic space are subject to fundamental limitations of metric spaces. These limitations prevent semantic space models from faithfully representing, for example, the pairwise similarities between word meanings as revealed by word association data. In particular, semantic space models cannot faithfully represent intransitive pairwise similarities or the similarities of words that have multiple meanings. In this paper, we present a model that alleviates the limitations of semantic space models by constructing a collection of maps that represent complementary structure in the similarity data. Our model is a variant of a similarity choice model known as Stochastic Neighbor Embedding that constructs multiple maps and allows each object to occur as a point in several different maps. We apply the model to a set of word association data, demonstrating that it can successfully represent intransitive semantic relations as well as words with multiple meanings, and that it outperforms traditional semantic space models in the prediction of word associations. We compare the model to alternative representations of semantic structure, such as topic models and semantic networks. Modeling Semantic Similarities in Multiple Maps Laurens van der Maaten ICT Group, Delft University of Technology Geoffrey Hinton Department of Computer Science, University of Toronto

[1]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[2]  Seungjin Choi,et al.  Fast stochastic neighbor embedding: a trust-region algorithm , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[3]  M. Ross Quillian,et al.  Retrieval time from semantic memory , 1969 .

[4]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[6]  Sebastian Schmidtlein,et al.  Mapping the floristic continuum : Ordination space position estimated from imaging spectroscopy , 2007 .

[7]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[8]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[9]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[10]  R. Shepard Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space , 1957 .

[11]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[12]  William D. Marslen-Wilson,et al.  Modelling the effects of semantic ambiguity in word recognition , 2004, Cogn. Sci..

[13]  R. Nosofsky Overall similarity and the identification of separable-dimension stimuli: A choice model analysis , 1985, Perception & psychophysics.

[14]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[15]  Geoffrey E. Hinton,et al.  Visualizing Similarity Data with a Mixture of Maps , 2007, AISTATS.

[16]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[17]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[18]  David C. Plaut,et al.  Structure and Function in the Lexical System: Insights from Distributed Models of Word Reading and Lexical Decision , 1997 .

[19]  A. H. Kawamoto Nonlinear dynamics in the resolution of lexical ambiguity: A parallel distributed processing account. , 1993 .

[20]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[21]  Zenglin Xu,et al.  Heavy-Tailed Symmetric Stochastic Neighbor Embedding , 2009, NIPS.

[22]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[23]  J. W. Hutchinson,et al.  Nearest neighbor analysis of psychological spaces. , 1986 .

[24]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[25]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[26]  W Richards,et al.  Trajectory Mapping: A New Nonmetric Scaling Technique , 1995, Perception.

[27]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[28]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[29]  Andrew McCallum,et al.  The author-recipient-topic model for topic and role discovery in social networks , 2005 .

[30]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[31]  B. Schölkopf,et al.  Similarity, Kernels, and the Triangle Inequality , 2008 .

[32]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[33]  Klaus-Robert Müller,et al.  Inducing Metric Violations in Human Similarity Judgements , 2006, NIPS.

[34]  Thomas A. Schreiber,et al.  The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[35]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[36]  L. Shastri,et al.  From simple associations to systematic reasoning: A connectionist representation of rules, variables and dynamic bindings using temporal synchrony , 1993, Behavioral and Brain Sciences.

[37]  D. Luce,et al.  Object Detection and Recognition , 2009, Encyclopedia of Database Systems.

[38]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[39]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[40]  Bob Rehder,et al.  Using latent semantic analysis to assess knowledge: Some technical considerations , 1998 .

[41]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[42]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[43]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[44]  Shimon Edelman,et al.  Similarity, Connectionism, and the Problem of Representation in Vision , 1997, Neural Computation.

[45]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[46]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: part 1.: an account of basic findings , 1988 .

[47]  James L. McClelland,et al.  Semantic Cognition: A Parallel Distributed Processing Approach , 2004 .

[48]  Wodek Gawronski,et al.  Models from Identification , 2008 .

[49]  Klaus-Robert Müller,et al.  Feature Discovery in Non-Metric Pairwise Data , 2004, J. Mach. Learn. Res..

[50]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[51]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[52]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[53]  J. Douglas Carroll,et al.  A quasi-nonmetric method for multidimensional scaling VIA an extended euclidean model , 1989 .

[54]  Peter W. Foltz,et al.  Learning from text: Matching readers and texts by latent semantic analysis , 1998 .

[55]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..