Scaling simulation-to-real transfer by learning composable robot skills

We present a novel solution to the problem of simulation-to-real transfer, which builds on recent advances in robot skill decomposition. Rather than focusing on minimizing the simulation-reality gap, we learn a set of diverse policies that are parameterized in a way that makes them easily reusable. This diversity and parameterization of low-level skills allows us to find a transferable policy that is able to use combinations and variations of different skills to solve more complex, high-level tasks. In particular, we first use simulation to jointly learn a policy for a set of low-level skills, and a "skill embedding" parameterization which can be used to compose them. Later, we learn high-level policies which actuate the low-level policies via this skill embedding parameterization. The high-level policies encode how and when to reuse the low-level skills together to achieve specific high-level tasks. Importantly, our method learns to control a real robot in joint-space to achieve these high-level tasks with little or no on-robot time, despite the fact that the low-level policies may not be perfectly transferable from simulation to real, and that the low-level skills were not trained on any examples of high-level tasks. We illustrate the principles of our method using informative simulation experiments. We then verify its usefulness for real robotics problems by learning, transferring, and composing free-space and contact motion skills on a Sawyer robot using only joint-space control. We experiment with several techniques for composing pre-learned skills, and find that our method allows us to use both learning-based approaches and efficient search-based planning to achieve high-level tasks using only pre-learned skills.

[1]  Gary L. Drescher,et al.  Made-up minds - a constructivist approach to artificial intelligence , 1991 .

[2]  Peter A. Cooper Paradigm Shifts in Designed Instruction: From Behaviorism to Cognitivism to Constructivism. , 1993 .

[3]  Stefan Schaal,et al.  Towards Associative Skill Memories , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[4]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Jan Peters,et al.  Extracting low-dimensional control variables for movement primitives , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Balaraman Ravindran,et al.  ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources , 2015, ArXiv.

[7]  Sergey Levine,et al.  Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments , 2015, ArXiv.

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[10]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[11]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[12]  Gaurav S. Sukhatme,et al.  Learning Relevant Features for Manipulation Skills using Meta-Level Priors , 2016, ArXiv.

[13]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[14]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[15]  Gaurav S. Sukhatme,et al.  Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[16]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[17]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[19]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Sergey Levine,et al.  Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[21]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[22]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[23]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[24]  Sergey Levine,et al.  Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[25]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[26]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.