The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

This paper introduces the Intentional Unintentional (IU) agent. This agent endows the deep deterministic policy gradients (DDPG) agent for continuous control with the ability to solve several tasks simultaneously. Learning to solve many tasks simultaneously has been a long-standing, core goal of artificial intelligence, inspired by infant development and motivated by the desire to build flexible robot manipulators capable of many diverse behaviours. We show that the IU agent not only learns to solve many tasks simultaneously but it also learns faster than agents that target a single task at-a-time. In some cases, where the single task DDPG method completely fails, the IU agent successfully solves the task. To demonstrate this, we build a playroom environment using the MuJoCo physics engine, and introduce a grounded formal language to automatically generate tasks.

[1]  Terry Winograd,et al.  Understanding natural language , 1974 .

[2]  E. Thelen Rhythmical stereotypies in normal human infants , 1979, Animal Behaviour.

[3]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[4]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[5]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[6]  J. W. Sparling,et al.  Fetal and neonatal hand movement. , 1999, Physical therapy.

[7]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8]  Giulio Sandini,et al.  Developmental robotics: a survey , 2003, Connect. Sci..

[9]  Stuart J. Russell,et al.  Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[10]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[11]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[12]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  H. G. Marques,et al.  Twitching in Sensorimotor Development from Sleeping Rats to Robots , 2013, Current Biology.

[14]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[15]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[16]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[17]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[18]  K. Adolph,et al.  4 Motor Development , 2015 .

[19]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Jianfeng Gao,et al.  Recurrent Reinforcement Learning: A Hybrid Approach , 2015, ArXiv.

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[24]  Yulia Tsvetkov,et al.  Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning , 2016, ACL.

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[27]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[28]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[29]  Misha Denil,et al.  Programmable Agents , 2017, ArXiv.

[30]  Wei Xu,et al.  A Deep Compositional Framework for Human-like Language Acquisition in Virtual Environment , 2017, ArXiv.

[31]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[32]  Romain Laroche,et al.  Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[33]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[34]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[35]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[36]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[37]  Ufuk Topcu,et al.  Environment-Independent Task Specifications via GLTL , 2017, ArXiv.