The Role of Embodied Intention in Early Lexical Acquisition

We examine the influence of inferring interlocutors' referential intentions from their body movements at the early stage of lexical acquisition. By testing human participants and comparing their performances in different learning conditions, we find that those embodied intentions facilitate both word discovery and word-meaning association. In light of empirical findings, the main part of this article presents a computational model that can identify the sound patterns of individual words from continuous speech, using nonlinguistic contextual information, and employ body movements as deictic references to discover word-meaning associations. To our knowledge, this work is the first model of word learning that not only learns lexical items from raw multisensory signals to closely resemble infant language development from natural environments, but also explores the computational role of social cognitive skills in lexical acquisition.

[1]  H. Wellman,et al.  Scaling of theory-of-mind tasks. , 2004, Child development.

[2]  Erik D. Thiessen,et al.  When cues collide: use of stress and statistical cues to word boundaries by 7- to 9-month-old infants. , 2003, Developmental psychology.

[3]  Alison Gopnik,et al.  Names, relational words, and cognitive development in English and Korean speakers: Nouns are not always learned before verbs. , 1995 .

[4]  P. D. Eimas,et al.  Evidence for Representations of Perceptually Similar Natural Categories by 3-Month-Old and 4-Month-Old Infants , 1993, Perception.


[6]  Linda B. Smith,et al.  Object perception and object naming in early development , 1998, Trends in Cognitive Sciences.

[7]  D. Povinelli,et al.  Mindblindness. An Essay on Autism and Theory of Mind Simon Baron-Cohen 1995 , 1996, Trends in Neurosciences.

[8]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[9]  Michael R. Brent,et al.  An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery , 1999, Machine Learning.

[10]  J. Siskind A computational study of cross-situational techniques for learning word-to-meaning mappings , 1996, Cognition.

[11]  J. J. Guajardo,et al.  Infants’ understanding of the point gesture as an object-directed action , 2002 .

[12]  P. Jusczyk,et al.  Infants’ sensitivity to allophonic cues for word segmentation , 1999, Perception & psychophysics.

[13]  Jeffrey Mark Siskind,et al.  Visual event perception , 1997 .

[14]  Susan Ervin-Tripp,et al.  SOME STRATEGIES FOR THE FIRST TWO YEARS , 1973 .

[15]  Chen Yu,et al.  Exploring the Role of Attention in Modeling Embodied Language Acquisition , 2003 .

[16]  P. Kuhl,et al.  Foreign-language experience in infancy: Effects of short-term exposure and social interaction on phonetic learning , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  S. Waxman,et al.  Object names and object functions serve as cues to categories for infants. , 2002, Developmental psychology.

[18]  Morten H. Christiansen,et al.  Learning to Segment Speech Using Multiple Cues: A Connectionist Model , 1998 .

[19]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[20]  D. Pisoni,et al.  Infants' Recognition of the Sound Patterns of Their Own Names , 1995, Psychological science.

[21]  Robert L. Goldstone,et al.  The development of features in object concepts , 1998, Behavioral and Brain Sciences.

[22]  J. Pind The Discovery of Spoken Language, Peter W. Jusczyk (Ed.). MIT Press (1997), ISBN 0 262 10058 4 , 1997 .

[23]  Roger M. Cooper,et al.  The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. , 1974 .

[24]  Mary R. Newsome,et al.  The Beginnings of Word Segmentation in English-Learning Infants , 1999, Cognitive Psychology.

[25]  Thierry Nazzi,et al.  Unfamiliar voice discrimination for short stimuli in newborns , 2000 .

[26]  G. Miller,et al.  Cognitive science. , 1981, Science.

[27]  Chen Yu,et al.  A multimodal learning interface for word acquisition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[28]  L. Gleitman The Structural Sources of Verb Meanings , 2020, Sentence First, Arguments Afterward.

[29]  D. D. Richards,et al.  The episodic memory model of conceptual development: An integrative viewpoint , 1986 .

[30]  Kenneth Ward Church,et al.  Phonological parsing and lexical retrieval , 1987, Cognition.

[31]  Joseph H. Goldberg,et al.  Identifying fixations and saccades in eye-tracking protocols , 2000, ETRA.

[32]  Dedre Gentner,et al.  Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning. Technical Report No. 257. , 1982 .

[33]  Michael R. Brent,et al.  Toward a Unified Model of Lexical Acquisition and Lexical Access , 1997 .

[34]  T. Tardif Nouns are not always learned before verbs : Evidence from Mandarin speakers' early vocabularies , 1996 .

[35]  Terry Regier,et al.  The Human Semantic Potential: Spatial Language and Constrained Connectionism , 1996 .

[36]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[37]  Rolf Adams,et al.  Seeded Region Growing , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Dare A. Baldwin,et al.  Early referential understanding: Infants' ability to recognize referential acts for what they are. , 1993 .

[39]  M. Tomasello Perceiving intentions and learning words in the second year of life , 2000 .

[40]  L. Gleitman,et al.  Hard Words , 2005, Language in Development.

[41]  G. Butterworth The ontogeny and phylogeny of joint visual attention. , 1991 .

[42]  Jeffrey Mark Siskind,et al.  Image Segmentation with Ratio Cut , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Jerome A. Feldman,et al.  When push comes to shove: a computational model of the role of motor control in the acquisition of action verbs , 1997 .

[44]  G. Lakoff,et al.  Metaphors We Live by , 1982 .

[45]  Susan Carey,et al.  Acquiring a Single New Word , 1978 .

[46]  Willard Van Orman Quine,et al.  Word and Object , 1960 .

[47]  L. Markson,et al.  Evidence against a dedicated system for word learning in children , 1997, Nature.

[48]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[49]  Michael Gasser,et al.  The Emergence of Words , 2001 .

[50]  Elissa L. Newport,et al.  Chapter 1 The Invention of Language by Children: Environmental and Biological Influences on the Acquisition of Language , 2005 .

[51]  V. Sloutsky,et al.  How much does a shared name make things similar? Part 1. Linguistic labels and the development of similarity judgment. , 1999, Developmental psychology.

[52]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[53]  E. Newport,et al.  PSYCHOLOGICAL SCIENCE Research Article INCIDENTAL LANGUAGE LEARNING: Ustening (and Learning) out of the Comer of Your Ear , 2022 .

[54]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[55]  Paul R. Cohen,et al.  Robot Baby 2001 , 2001, Discovery Science.

[56]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[57]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[58]  A. Cutler,et al.  Rhythmic cues to speech segmentation: Evidence from juncture misperception , 1992 .

[59]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[60]  Chen Yu,et al.  A multimodal learning interface for grounding spoken language in sensory perceptions , 2003, ICMI '03.

[61]  Ping Li,et al.  Early lexical development in a self-organizing neural network , 2004, Neural Networks.

[62]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[63]  Kenneth Roberts,et al.  Linguistic versus attentional influences on nonlinguistic categorization in 15-month-old infants , 1991 .

[64]  Dare A. Baldwin,et al.  Infants' reliance on a social criterion for establishing word-object relations. , 1996, Child development.

[65]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[66]  W. Levelt,et al.  Viewing and naming objects: eye movements during noun phrase production , 1998, Cognition.

[67]  Susan Ervin-Tripp,et al.  Language Acquisition and Conceptual Development. , 2004 .

[68]  Gunnar Johansson,et al.  Visual Event Perception , 1978 .

[69]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[70]  T. A. Cartwright,et al.  Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.

[71]  Zenzi M. Griffin,et al.  What the Eyes Say About Speaking , 2000, Psychological science.

[72]  T. Regier Emergent constraints on word-learning: a computational perspective , 2003, Trends in Cognitive Sciences.

[73]  S. Waxman,et al.  The development of a linkage between count nouns and object categories: evidence from fifteen- to twenty-one-month-old infants. , 1993, Child development.

[74]  P. Bloom How children learn the meanings of words , 2000 .

[75]  Elizabeth K. Johnson,et al.  Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than Statistics , 2001 .

[76]  P. Schyns,et al.  Categorization creates functional features , 1997 .

[77]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[78]  G. Lakoff,et al.  Philosophy in the flesh : the embodied mind and its challenge to Western thought , 1999 .

[79]  S. Waxman Everything Had a Name, and Each Name Gave Birth to a New Thought: Links between Early Word Learning and Conceptual Organization. , 2004 .

[80]  Elissa L. Newport,et al.  The invention of language by children: Environmental and biological influences , 2002 .

[81]  A. Gopnik,et al.  Natural theories of mind: Evolution, development and simulation of everyday mindreading , 2015 .

[82]  S. Levinson,et al.  Language Acquisition and Conceptual Development , 2001 .

[83]  P. Ladefoged A course in phonetics , 1975 .

[84]  Linda B. Smith Learning How to Learn Words , 2000 .

[85]  A. Clark Being There: Putting Brain, Body, and World Together Again , 1996 .

[86]  M. Brent Speech segmentation and word discovery: a computational perspective , 1999, Trends in Cognitive Sciences.

[87]  Ellen M. Markman,et al.  Constraints on word meaning in early language acquisition , 1994 .

[88]  Kim Plunkett,et al.  Theories of early language acquisition , 1997, Trends in Cognitive Sciences.

[89]  R. Lickliter,et al.  Intersensory Redundancy Guides the Development of Selective Attention, Perception, and Cognition in Infancy , 2004 .

[90]  M. R. Manzini Learnability and Cognition , 1991 .

[91]  M. Arbib From monkey-like action recognition to human language: An evolutionary framework for neurolinguistics , 2005, Behavioral and Brain Sciences.

[92]  P. Jusczyk,et al.  Phonotactic cues for segmentation of fluent speech by infants , 2001, Cognition.

[93]  Carolyn Graham,et al.  I Went Walking , 1990 .

[94]  Alan Slater,et al.  Intermodal perception at birth: Intersensory redundancy guides newborn infants’ learning of arbitrary auditory−visual pairings , 1999 .

[95]  H. Gleitman,et al.  Human simulations of vocabulary learning , 1999, Cognition.

[96]  P. Jusczyk,et al.  Infants′ Detection of the Sound Patterns of Words in Fluent Speech , 1995, Cognitive Psychology.

[97]  John R. Anderson,et al.  Tracing Eye Movement Protocols with Cognitive Process Models , 1998 .

[98]  M. Tomasello,et al.  Joint attention and early language. , 1986, Child development.

[99]  L. Gogate,et al.  The intersensory origins of word‐comprehension: an ecological–dynamic systems view , 2001 .

[100]  Chris Sinha,et al.  Symbol Grounding or the Emergence of Symbols? Vocabulary Growth in Children and a Connectionist Net , 1992 .

[101]  Linda B. Smith,et al.  Naming in young children: a dumb attentional mechanism? , 1996, Cognition.