Quasi-compositional mapping from form to meaning: a neural network-based approach to capturing neural responses during human language comprehension

We argue that natural language can be usefully described as quasi-compositional and we suggest that deep learning-based neural language models bear long-term promise to capture how language conveys meaning. We also note that a successful account of human language processing should explain both the outcome of the comprehension process and the continuous internal processes underlying this performance. These points motivate our discussion of a neural network model of sentence comprehension, the Sentence Gestalt model, which we have used to account for the N400 component of the event-related brain potential (ERP), which tracks meaning processing as it happens in real time. The model, which shares features with recent deep learning-based language models, simulates N400 amplitude as the automatic update of a probabilistic representation of the situation or event described by the sentence, corresponding to a temporal difference learning signal at the level of meaning. We suggest that this process happens relatively automatically, and that sometimes a more-controlled attention-dependent process is necessary for successful comprehension, which may be reflected in the subsequent P600 ERP component. We relate this account to current deep learning models as well as classic linguistic theory, and use it to illustrate a domain general perspective on some specific linguistic operations postulated based on compositional analyses of natural language. This article is part of the theme issue ‘Towards mechanistic models of meaning composition’.

[1]  Kenny R. Coventry,et al.  Connectionist Modeling of Linguistic Quantifiers , 2005, ICANN.

[2]  S. Frank,et al.  The ERP response to the amount of information conveyed by words in sentences , 2015, Brain and Language.

[3]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[4]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[5]  D. Rumelhart Metaphor and Thought: Some problems with the notion of literal meanings , 1993 .

[6]  C. Van Petten,et al.  Prediction during language comprehension: benefits, costs, and ERP components. , 2012, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[7]  Giosuè Baggio,et al.  Coercion and Compositionality , 2010, Journal of Cognitive Neuroscience.

[8]  Colin M. Brown,et al.  ERP effects of listening to speech compared to reading: the P600/SPS to syntactic violations in spoken sentences and rapid serial visual presentation , 2000, Neuropsychologia.

[9]  Bonnie L. Webber,et al.  Neural Networks for Cross-lingual Negation Scope Detection , 2018, ArXiv.

[10]  A. M. Ramer Mathematical Methods in Linguistics , 1992 .

[11]  Karl G. D. Bailey,et al.  Good-Enough Representations in Language Comprehension , 2002 .

[12]  David I. Beaver,et al.  The Handbook of Logic and Language , 1997 .

[13]  John Hale,et al.  Uncertainty About the Rest of the Sentence , 2006, Cogn. Sci..

[14]  Joan L. Bybee,et al.  Gradience of Gradience: A reply to Jackendoff , 2007 .

[15]  Richard Futrell,et al.  Don’t Underestimate the Benefits of Being Misunderstood , 2017, Psychological science.

[16]  James L. McClelland,et al.  On the control of automatic processes: a parallel distributed processing account of the Stroop effect. , 1990, Psychological review.

[17]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[18]  Barbara H. Partee,et al.  4. Semantic Facts and Psychological Facts , 1988 .

[19]  Stefan L. Frank,et al.  Uncertainty Reduction as a Measure of Cognitive Load in Sentence Comprehension , 2013, Top. Cogn. Sci..

[20]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[21]  Samuel J Cheyette,et al.  Modeling the N400 ERP component as transient semantic over-activation within a neural network model of word comprehension , 2017, Cognition.

[22]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[23]  Giosuè Baggio,et al.  Meaning-driven syntactic predictions in a parallel processing architecture: Theory and algorithmic modeling of ERP effects , 2019, Neuropsychologia.

[24]  Theo M. V. Janssen Chapter 7 – Compositionality , 1997 .

[25]  Neil Cohn,et al.  Electrophysiological Correlates of Complement Coercion , 2010, Journal of Cognitive Neuroscience.

[26]  Martin Paczynski,et al.  The difference between "giving a rose" and "giving a kiss": Sustained neural activity to the light verb construction. , 2014, Journal of memory and language.

[27]  James L. McClelland Capturing Gradience, Continuous Change, and Quasi‐Regularity in Sound, Word, Phrase, and Meaning , 2015 .

[28]  Refractor Vision , 2000, The Lancet.

[29]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[30]  Blair C. Armstrong,et al.  PSPs and ERPs: Applying the dynamics of post-synaptic potentials to individual units in simulation of temporally extended Event-Related Potential reading data , 2014, Brain and Language.

[31]  Kara D. Federmeier,et al.  Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). , 2011, Annual review of psychology.

[32]  Franklin Chang,et al.  Language ERPs reflect learning through prediction error propagation , 2019, Cognitive Psychology.

[33]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[34]  Marco Baroni,et al.  Human few-shot learning of compositional instructions , 2019, CogSci.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  James L. McClelland,et al.  Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition , 2005 .

[37]  D. Plaut,et al.  A neurally plausible Parallel Distributed Processing model of Event-Related Potential word reading data , 2012, Brain and Language.

[38]  Marco Baroni,et al.  Linguistic generalization and compositionality in modern artificial neural networks , 2019, Philosophical Transactions of the Royal Society B.

[39]  J. Haldane The interaction of nature and nurture. , 1946, Annals of eugenics.

[40]  J. Polich Updating P300: An integrative theory of P3a and P3b , 2007, Clinical Neurophysiology.

[41]  Leon Bergen,et al.  Rational integration of noisy evidence and prior semantic expectations in sentence interpretation , 2013, Proceedings of the National Academy of Sciences.

[42]  W. Sommer,et al.  Differential Task Effects on N400 and P600 Elicited by Semantic and Syntactic Violations , 2014, PloS one.

[43]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[44]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[45]  Hung-Yu Kao,et al.  Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[46]  Roel M. Willems,et al.  Grounding the neurobiology of language in first principles: The necessity of non-language-centric explanations for language comprehension , 2018, Cognition.

[47]  E. Vogel,et al.  Word meanings can be accessed but not reported during the attentional blink , 1996, Nature.

[48]  Milena Rabovsky,et al.  Simulating the N400 ERP component as semantic network error: Insights from a feature-based connectionist attractor model of word meaning , 2014, Cognition.

[49]  James L. McClelland,et al.  Learning and Applying Contextual Constraints in Sentence Comprehension , 1990, Artif. Intell..

[50]  Lyn Frazier,et al.  Sentence processing: A tutorial review. , 1987 .

[51]  D. Swinney,et al.  Brain potentials elicited by garden-path sentences: evidence of the application of verb information during parsing. , 1994, Journal of experimental psychology. Learning, memory, and cognition.

[52]  Matthew W. Crocker,et al.  A Neurocomputational Model of the N400 and the P600 in Language Processing , 2016, Cognitive science.

[53]  Fernanda Ferreira,et al.  The misinterpretation of noncanonical sentences , 2003, Cognitive Psychology.

[54]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[55]  M. Crocker,et al.  Teasing apart coercion and surprisal: Evidence from eye-movements and ERPs , 2017, Cognition.

[56]  Matthias Schlesewsky,et al.  An alternative perspective on “semantic P600” effects in language comprehension , 2008, Brain Research Reviews.

[57]  Martin Paczynski,et al.  When Events Change Their Nature: The Neurocognitive Mechanisms Underlying Aspectual Coercion , 2014, Journal of Cognitive Neuroscience.

[58]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[59]  Ina Bornkessel-Schlesewsky,et al.  The P600 as a correlate of ventral attention network reorientation , 2015, Cortex.

[60]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[61]  Barbara H. Partee,et al.  Belief-Sentences and the Limits of Semantics , 1982 .

[62]  Brenden M. Lake,et al.  Compositional generalization through meta sequence-to-sequence learning , 2019, NeurIPS.

[63]  Shravan Vasishth,et al.  The Importance of Reading Naturally: Evidence From Combined Recordings of Eye Movements and Electric Brain Potentials. , 2017, Cognitive science.

[64]  James L. McClelland,et al.  Concepts, Control, and Context: A Connectionist Account of Normal and Disordered Semantic Cognition , 2018, Psychological review.

[65]  James L. McClelland,et al.  Modelling the N400 brain potential as change in a probabilistic representation of meaning , 2018, Nature Human Behaviour.

[66]  Yves Chauvin,et al.  Backpropagation: the basic theory , 1995 .

[67]  L. Osterhout,et al.  The independence of combinatory semantic processing: Evidence from event-related potentials , 2005 .

[68]  Tal Linzen What can linguistics and deep learning contribute to each other? Response to Pater , 2019, Language.