Towards Real Robot Learning in the Wild: A Case Study in Bipedal Locomotion

Algorithms for self-learning systems have made considerable progress 1 in recent years, yet safety concerns and the need for additional instrumentation 2 have so far largely limited learning experiments with real robots to well con3 trolled lab settings. In this paper, we demonstrate how a small bipedal robot can 4 autonomously learn to walk with minimal human intervention and with minimal 5 instrumentation of the environment. We employ data-efficient off-policy deep 6 reinforcement learning to learn to walk end-to-end, directly on hardware, using 7 rewards that are computed exclusively from proprioceptive sensing. To allow the 8 robot to autonomously adapt its behaviour to its environment, we additionally pro9 vide the agent with raw RGB camera images as input. By deploying two robots in 10 different geographic locations while sharing data in a distributed learning setup, 11 we achieve higher throughput and greater diversity of the training data. Our learn12 ing experiments constitute a step towards the long-term vision of learning “in the 13 wild” for legged robots, and, to our knowledge, represent the first demonstration 14 of learning a deep neural network controller for bipedal locomotion directly on 15 hardware. 16

[1]  H. Sebastian Seung,et al.  Learning to Walk in 20 Minutes , 2005 .

[2]  Christopher G. Atkeson,et al.  Optimization based full body control for the atlas robot , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[3]  Albin Cassirer,et al.  Reverb: A Framework For Experience Replay , 2021, ArXiv.

[4]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[5]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[6]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[7]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[8]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[9]  Alan Fern,et al.  Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning , 2021, Robotics: Science and Systems.

[10]  Sergey Levine,et al.  Learning to Walk in the Real World with Minimal Human Effort , 2020, CoRL.

[11]  Christopher G. Atkeson,et al.  Bayesian Optimization Using Domain Knowledge on the ATRIAS Biped , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Sergio Gomez Colmenarejo,et al.  Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.

[14]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Sehoon Ha,et al.  Automated Deep Reinforcement Learning Environment for Hardware of a Modular Legged Robot , 2018, 2018 15th International Conference on Ubiquitous Robots (UR).

[16]  Miomir Vukobratovic,et al.  Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.

[17]  Koushil Sreenath,et al.  Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Hannes Sommer,et al.  Quadrupedal locomotion using hierarchical operational space control , 2014, Int. J. Robotics Res..

[19]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.