论文信息 - Learning to Walk in 20 Minutes

Learning to Walk in 20 Minutes

We present a statistical gradient following algorithm which optimizes a control policy for bipedal walking online on a real robot. One of the distinguishing features of this system is that learning and execution occur simultaneously: there are no explicit learning trials and there is no need to model the dynamics of the robot in a simulation. Thanks in part to the mechanical design of the robot, the system is able to reliably acquire a robust policy for dynamic bipedal walking from a blank slate in less than 20 minutes. Once the robot begins walking, it quickly and continually adapts to the terrain with every step that it takes.

H. Sebastian Seung | Russ Tedrake | Russ Tedrake | H. Seung

[1] Daniel E. Koditschek,et al. Analysis of a Simplified Hopping Robot , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[2] Tad McGeer,et al. Passive Dynamic Walking , 1990, Int. J. Robotics Res..

[3] Andrew L. Kun,et al. Adaptive dynamic balance of a biped robot using neural networks , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[4] Stefan Schaal,et al. Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[5] Jun Morimoto,et al. Conference on Intelligent Robots and Systems Reinforcement Le,arning of Dynamic Motor Sequence: Learning to Stand Up , 2022 .

[6] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[7] T. Takenaka,et al. The development of Honda humanoid robot , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[8] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[9] Martijn Wisse,et al. A Three-Dimensional Passive-Dynamic Walking Robot with Two Legs and Knees , 2001, Int. J. Robotics Res..

[10] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[11] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[12] H. Sebastian Seung,et al. Actuating a simple 3D passive dynamic walker , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[13] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15] Russ Tedrake,et al. Efficient Bipedal Robots Based on Passive-Dynamic Walkers , 2005, Science.

[16] Russ Tedrake,et al. Probabilistic Stability in Legged Systems : Metastability and the Mean First Passage Time ( MFPT ) Stability Margin , 2006 .