Compositional Obverter Communication Learning From Raw Visual Input

One of the distinguishing aspects of human language is its compositionality, which allows us to describe complex environments with limited vocabulary. Previously, it has been shown that neural network agents can learn to communicate in a highly structured, possibly compositional language based on disentangled input (e.g. hand- engineered features). Humans, however, do not learn to communicate based on well-summarized features. In this work, we train neural agents to simultaneously develop visual perception from raw image pixels, and learn to communicate with a sequence of discrete symbols. The agents play an image description game where the image contains factors such as colors and shapes. We train the agents using the obverter technique where an agent introspects to generate messages that maximize its own understanding. Through qualitative analysis, visualization and a zero-shot test, we show that the agents can develop, out of raw image pixels, a language with compositional properties, given a proper pressure from the environment.

[1]  Richard L. Lewis,et al.  A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints , 2010 .

[2]  José M. F. Moura,et al.  Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[3]  Darren Newtson,et al.  The Structure of Action and Interaction , 1987 .

[4]  Yann Dauphin,et al.  Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[5]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[6]  Csr Young,et al.  How to Do Things With Words , 2009 .

[7]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[12]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[13]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[14]  S. Kirby,et al.  Iterated learning and the evolution of language , 2014, Current Opinion in Neurobiology.

[15]  Emil Gustavsson,et al.  Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence , 2016, ArXiv.

[16]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[17]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  M. Studdert-Kennedy,et al.  Approaches to the Evolution of Language , 1999 .

[19]  J. Astington,et al.  Language and theory of mind: meta-analysis of the relation between language ability and false-belief understanding. , 2007, Child development.

[20]  S. Kirby,et al.  The emergence of linguistic structure: an overview of the iterated learning model , 2002 .

[21]  M. Tomasello,et al.  Does the chimpanzee have a theory of mind? 30 years later , 2008, Trends in Cognitive Sciences.

[22]  Michael Oliphant,et al.  Learning and the Emergence of Coordinated Communication , 1997 .

[23]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[24]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[25]  James R. Hurford,et al.  Biological evolution of the Saussurean sign as a component of the language acquisition device , 1989 .

[26]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[27]  Andrew D. M. Smith,et al.  Establishing Communication Systems without Explicit Meaning Transmission , 2001, ECAL.