Exploring the Role of Attention in Modeling Embodied Language Acquisition

Language is about symbols and those symbols must be learned during infant development. Most recently, there has been an increased awareness of the essential role of inferences of speakers’ referential intentions in grounding those symbols. Experiments have shown that these inferences serve as an important driving force in language learning at a relatively early age. The challenge ahead is to develop formal models of language acquisition that can shed light on the leverage provided by embodiment and attention. This paper describes a computational model of embodied language acquisition that can simulate some of the formative steps in infant language acquisition. The novelty of our work is that the model shares multisensory information with a real agent in a first-person sense, and eye gaze is utilized as deictic reference to spot temporal correlations between different modalities. As a result, the system can build meaningful semantic representations that are grounded in the physical world. We test our model’s ability to associate spoken names of objects with their visually grounded meanings and compare the results of our approach with the case that does not use referential intentions.

