论文信息 - Toward fast policy search for learning legged locomotion

Toward fast policy search for learning legged locomotion

Legged locomotion is one of the most versatile forms of mobility. However, despite the importance of legged locomotion and the large number of legged robotics studies, no biped or quadruped matches the agility and versatility of their biological counterparts to date. Approaches to designing controllers for legged locomotion systems are often based on either the assumption of perfectly known dynamics or mechanical designs that substantially reduce the dimensionality of the problem. The few existing approaches for learning controllers for legged systems either require exhaustive real-world data or they improve controllers only conservatively, leading to slow learning. We present a data-efficient approach to learning feedback controllers for legged locomotive systems, based on learned probabilistic forward models for generating walking policies. On a compass walker, we show that our approach allows for learning gait policies from very little data. Moreover, we analyze learned locomotion models of a biomechanically inspired biped. Our approach has the potential to scale to high-dimensional humanoid robots with little loss in efficiency.

[1] David Kraus. Concepts in modern biology , 1974 .

[2] Tad McGeer,et al. Passive Dynamic Walking , 1990, Int. J. Robotics Res..

[3] D. Wolpert,et al. Internal models in the cerebellum , 1998, Trends in Cognitive Sciences.

[4] Jerry E. Pratt,et al. Intuitive control of a planar bipedal walking robot , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[5] J. D. Withrow,et al. Biomechanics of Knee Ligaments , 1993, The American journal of sports medicine.

[6] E. Bizzi,et al. Muscle synergies encoded within the spinal cord: evidence from focal intraspinal NMDA iontophoresis in the frog. , 2001, Journal of neurophysiology.

[7] Jeff G. Schneider,et al. Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[8] Kazuhito Yokoi,et al. The 3D linear inverted pendulum mode: a simple modeling for a biped walking pattern generation , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[9] Shin Ishii,et al. Reinforcement Learning for Biped Locomotion , 2002, ICANN.

[10] Richard R Neptune,et al. Biomechanics and muscle coordination of human walking. Part I: introduction to concepts, power transfer, dynamics and simulations. , 2002, Gait & posture.

[11] C. Atkeson,et al. Minimax differential dynamic programming: application to a biped walking robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[12] Miomir Vukobratovic,et al. Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.

[13] H. Sebastian Seung,et al. Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14] Jianqing Fan,et al. Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[15] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[17] Jun Morimoto,et al. Learning Biped Locomotion , 2007, IEEE Robotics & Automation Magazine.

[18] André Seyfarth,et al. Exploring Toe Walking in a Bipedal Robot , 2007, AMS.

[19] Siddhartha S. Srinivasa,et al. Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[20] C. Robert. Discussion of "Sure independence screening for ultra-high dimensional feature space" by Fan and Lv. , 2008 .

[21] Dieter Fox,et al. GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, IROS.

[22] Katie Byl,et al. Approximate optimal control of the compass gait on rough terrain , 2008, 2008 IEEE International Conference on Robotics and Automation.

[23] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[24] Yang Zhang,et al. A study on control mechanism of above knee robotic prosthesis based on CPG model , 2010, 2010 IEEE International Conference on Robotics and Biomimetics.

[25] Fumiya Iida,et al. Minimalistic control of biped walking in rough terrain , 2010, Auton. Robots.

[26] Hartmut Geyer,et al. A Muscle-Reflex Model That Encodes Principles of Legged Mechanics Produces Human Walking Dynamics and Muscle Activities , 2010, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[27] Carl E. Rasmussen,et al. Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[28] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[29] André Seyfarth,et al. Leg-adjustment strategies for stable running in three dimensions , 2012, Bioinspiration & biomimetics.