Reinforcement learning of full-body humanoid motor skills

Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

[1]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation , 1984, 1984 American Control Conference.

[2]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation: Part I—Theory , 1985 .

[3]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[4]  Jon Rigelsford,et al.  Modelling and Control of Robot Manipulators , 2000 .

[5]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[6]  Masayuki Inaba,et al.  Dynamically-Stable Motion Planning for Humanoid Robots , 2002, Auton. Robots.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Jun Morimoto,et al.  CB: A Humanoid Research Platform for Exploring NeuroScience , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[9]  Oussama Khatib,et al.  Synthesis and control of whole-body behaviors in humanoid systems , 2007 .

[10]  Hilbert J. Kappen,et al.  Graphical Model Inference in Optimal Control of Stochastic Multi-Agent Systems , 2008, J. Artif. Intell. Res..

[11]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[12]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[14]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[15]  Stefan Schaal,et al.  Variable Impedance Control - A Reinforcement Learning Approach , 2010, Robotics: Science and Systems.