Policy gradient reinforcement learning for fast quadrupedal locomotion

This paper presents a machine learning approach to optimizing a quadrupedal trot gait for forward speed. Given a parameterized walk designed for a specific robot, we propose using a form of policy gradient reinforcement learning to automatically search the set of possible parameters with the goal of finding the fastest possible walk. We implement and test our approach on a commercially available quadrupedal robot platform, namely the Sony Aibo robot. After about three hours of learning, all on the physical robots and with no human intervention other than to change the batteries, the robots achieved a gait faster than any previously known gait known for the Aibo, significantly outperforming a variety of existing hand-coded and learned solutions.

[1]  William H. Press,et al.  Numerical Recipes in C The Art of Scientific Computing , 1995 .

[2]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[3]  C. Watkins Learning from delayed rewards , 1989 .

[4]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[5]  Gregory S. Hornby,et al.  Autonomous evolution of gaits with the Sony Quadruped Robot , 1999 .

[6]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7]  Masahiro Fujita,et al.  Evolving robust gaits with AIBO , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[8]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[9]  Peter Stone,et al.  RoboCup 2000: Robot Soccer World Cup IV , 2001, RoboCup.

[10]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[11]  Claude Sammut,et al.  Omnidirectional Locomotion for Quadruped Robots , 2001, RoboCup.

[12]  Nicholas K. Jong,et al.  The UT Austin Villa 2003 Four-Legged Team , 2003 .

[13]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[14]  Prahlad Vadakkepat,et al.  An Evolutionary Algorithm for Trajectory Based Gait Generation of Biped Robot , 2003 .

[15]  William T. B. Uther,et al.  Automatic Gait Optimisation for Quadruped Robots , 2003 .

[16]  Stephan K. Chalup,et al.  Techniques for Improving Vision and Locomotion on the Sony AIBO Robot , 2003 .

[17]  Daniel E. Koditschek,et al.  Automated gait adaptation for legged robots , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[18]  Peter Stone,et al.  A Model-Based Approach to Robot Joint Control , 2005, RoboCup.