Learning to Walk in 20 Minutes

We present a statistical gradient following algorithm which optimizes a control policy for bipedal walking online on a real robot. One of the distinguishing features of this system is that learning and execution occur simultaneously: there are no explicit learning trials and there is no need to model the dynamics of the robot in a simulation. Thanks in part to the mechanical design of the robot, the system is able to reliably acquire a robust policy for dynamic bipedal walking from a blank slate in less than 20 minutes. Once the robot begins walking, it quickly and continually adapts to the terrain with every step that it takes.

[1]  Daniel E. Koditschek,et al.  Analysis of a Simplified Hopping Robot , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[2]  Tad McGeer,et al.  Passive Dynamic Walking , 1990, Int. J. Robotics Res..

[3]  Andrew L. Kun,et al.  Adaptive dynamic balance of a biped robot using neural networks , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[4]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[5]  Jun Morimoto,et al.  Conference on Intelligent Robots and Systems Reinforcement Le,arning of Dynamic Motor Sequence: Learning to Stand Up , 2022 .

[6]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[7]  T. Takenaka,et al.  The development of Honda humanoid robot , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[8]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[9]  Martijn Wisse,et al.  A Three-Dimensional Passive-Dynamic Walking Robot with Two Legs and Knees , 2001, Int. J. Robotics Res..

[10]  Peter L. Bartlett,et al.  Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[11]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[12]  H. Sebastian Seung,et al.  Actuating a simple 3D passive dynamic walker , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[13]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Russ Tedrake,et al.  Efficient Bipedal Robots Based on Passive-Dynamic Walkers , 2005, Science.

[16]  Russ Tedrake,et al.  Probabilistic Stability in Legged Systems : Metastability and the Mean First Passage Time ( MFPT ) Stability Margin , 2006 .