Extending Machine Language Models toward Human-Level Language Understanding

Language is central to human intelligence. We review recent break- throughs in machine language processing and consider what re- mains to be achieved. Recent approaches rely on domain general principles of learning and representation captured in artificial neu- ral networks. Most current models, however, focus too closely on language itself. In humans, language is part of a larger system for acquiring, representing, and communicating about objects and sit- uations in the physical and social world, and future machine lan- guage models should emulate such a system. We describe exist- ing machine models linking language to concrete situations, and point toward extensions to address more abstract cases. Human language processing exploits complementary learning systems, in- cluding a deep neural network-like learning system that learns grad- ually as machine systems do, as well as a fast-learning system that supports learning new information quickly. Adding such a system to machine language models will be an important further step toward truly human-like language understanding.

[1]  J. Wilder The Origins of Intelligence in Children , 1954 .

[2]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[3]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[4]  B. Milner,et al.  Further analysis of the hippocampal amnesic syndrome: 14-year follow-up study of H.M.☆ , 1968 .

[5]  R. M. Warren Perceptual Restoration of Missing Speech Sounds , 1970, Science.

[6]  D Marr,et al.  Simple memory: a theory for archicortex. , 1971, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[7]  J. Piaget,et al.  The Origins of Intelligence in Children , 1971 .

[8]  Leon A. Jakobovits,et al.  Semantics: An Interdisciplinary Reader in Philosophy, Linguistics and Psychology , 1971 .

[9]  Terry Winograd,et al.  Understanding natural language , 1974 .

[10]  Marcia K. Johnson,et al.  Contextual prerequisites for understanding: Some investigations of comprehension and recall , 1972 .

[11]  Richard Montague,et al.  The Proper Treatment of Quantification in Ordinary English , 1973 .

[12]  J. Bransford,et al.  Comprehension and semantic flexibility , 1974 .

[13]  G. Simpson Meaning dominance and semantic context in the processing of lexical ambiguity , 1981 .

[14]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[15]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[16]  G. Lakoff,et al.  Metaphors We Live by , 1982 .

[17]  Roger C. Schank,et al.  Dynamic memory - a theory of reminding and learning in computers and people , 1983 .

[18]  C SchankRoger,et al.  Dynamic Memory: A Theory of Reminding and Learning in Computers and People , 1983 .

[19]  J. Fodor Modularity of mind , 1983 .

[20]  Douglas L. Hintzman,et al.  MINERVA 2: A simulation model of human memory , 1984 .

[21]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[22]  Geoffrey E. Hinton,et al.  Schemata and Sequential Thought Processes in PDP Models , 1986 .

[23]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[24]  George Lakoff,et al.  Women, Fire, and Dangerous Things , 1987 .

[25]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[26]  James L. McClelland,et al.  Constituent Attachment and Thematic Role Assignment in Sentence Processing: Influences of Content-Based Expectations , 1988 .

[27]  S. Pinker,et al.  Connections and symbols , 1988 .

[28]  N. Cohen,et al.  The impaired learning of semantic knowledge following bilateral medial temporal-lobe resection , 1988, Brain and Cognition.

[29]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[30]  James L. McClelland,et al.  Learning and Applying Contextual Constraints in Sentence Comprehension , 1990, Artif. Intell..

[31]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[32]  B. MacWhinney,et al.  Implementations are not conceptualizations: Revising the verb learning model , 1991, Cognition.

[33]  James L. McClelland,et al.  Learning the structure of event sequences. , 1991, Journal of experimental psychology. General.

[34]  L. Squire Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. , 1992, Psychological review.

[35]  David E. Rumelhart,et al.  Toward an interactive model of reading. , 1994 .

[36]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[37]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[38]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  Rolf A. Zwaan,et al.  Situation models in language comprehension and memory. , 1998, Psychological bulletin.

[41]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[42]  James L. McClelland,et al.  Structure and deterioration of semantic memory: a neuropsychological and computational investigation. , 2004, Psychological review.

[43]  J. Feldman,et al.  Embodied meaning in a neural theory of language , 2004, Brain and Language.

[44]  J. Bryson Embodiment vs . Memetics , 2004 .

[45]  J. Elman Distributed representations, simple recurrent networks, and grammatical structure , 1991, Machine Learning.

[46]  James L. McClelland,et al.  Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition , 2005 .

[47]  G. Altmann,et al.  The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing , 2007 .

[48]  T. Rogers,et al.  Where do you know what you know? The representation of semantic knowledge in the human brain , 2007, Nature Reviews Neuroscience.

[49]  Joanna J. Bryson,et al.  Embodiment versus memetics , 2008 .

[50]  Peter Hagoort,et al.  When Elephants Fly: Differential Sensitivity of Right and Left Inferior Frontal Gyri to Discourse and World Knowledge , 2009, Journal of Cognitive Neuroscience.

[51]  Sean M. Polyn,et al.  A context maintenance and retrieval model of organizational processes in free recall. , 2009, Psychological review.

[52]  J. Baldo,et al.  Is relational reasoning dependent on language? A voxel-based lesion symptom mapping study , 2010, Brain and Language.

[53]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[54]  C. Ranganath,et al.  Two cortical systems for memory-guided behaviour , 2012, Nature Reviews Neuroscience.

[55]  T. Chaminade,et al.  Stone tools, language and the brain in human evolution , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[56]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[57]  Geoffrey E. Hinton,et al.  Implementing Semantic Networks in Parallel Hardware , 2014 .

[58]  James L. McClelland,et al.  Interactive Activation and Mutual Constraint Satisfaction in Perception and Cognition , 2014, Cogn. Sci..

[59]  David C. Plaut,et al.  Quasiregularity and Its Discontents: The Legacy of the Past Tense Debate , 2014, Cogn. Sci..

[60]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[61]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[62]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[63]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[64]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[65]  Wei Xu,et al.  Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[67]  David M. Blei,et al.  Structured Embedding Models for Grouped Data , 2017, NIPS.

[68]  Rajarshi Das,et al.  Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks , 2017, ACL.

[69]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[70]  Jonathan W. Pillow,et al.  Discovering Event Structure in Continuous Narrative Perception and Memory , 2016, Neuron.

[71]  Yuan Chang Leong,et al.  How We Transmit Memories to Other Brains: Constructing Shared Neural Representations Via Communication , 2016, bioRxiv.

[72]  Meredith Ringel Morris,et al.  Understanding Blind People's Experiences with Computer-Generated Captions of Social Media Images , 2017, CHI.

[73]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[74]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[75]  Roel M. Willems,et al.  Grounding the neurobiology of language in first principles: The necessity of non-language-centric explanations for language comprehension , 2018, Cognition.

[76]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[77]  Ruslan Salakhutdinov,et al.  Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[78]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[79]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[80]  Christopher D. Manning,et al.  Learning by Abstraction: The Neural State Machine , 2019, NeurIPS.

[81]  Arne D. Ekstrom,et al.  A contextual binding theory of episodic memory: systems consolidation reconsidered , 2019, Nature Reviews Neuroscience.

[82]  Stephen Clark,et al.  Emergent Systematic Generalization in a Situated Agent , 2019, ICLR 2020.

[83]  David E. Warren,et al.  Fast mappers, slow learners: Word learning without hippocampus is slow and sparse irrespective of methodology , 2019, Cognitive neuroscience.

[84]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[85]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[86]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[87]  Cordelia Schmid,et al.  Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.

[88]  Omer Levy,et al.  Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.

[89]  Andrew K. Lampinen,et al.  Integration of new information in memory: new insights from a complementary learning systems perspective , 2020, Philosophical Transactions of the Royal Society B.

[90]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[91]  Marco Baroni,et al.  Syntactic Structure from Deep Learning , 2020, Annual Review of Linguistics.

[92]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[93]  Christopher J. Honey,et al.  Temporal integration of narrative information in a hippocampal amnesic patient , 2020, NeuroImage.

[94]  J. Weston,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.

[95]  James L. McClelland,et al.  Environmental drivers of systematicity and generalization in a situated agent , 2019, ICLR.

[96]  Peter Hagoort,et al.  Word contexts enhance the neural representation of individual letters in early visual cortex , 2020, Nature Communications.