Learning variable impedance control

One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high degree-of-freedom (DOF) robotic tasks. In this contribution, we accomplish such variable impedance control with the reinforcement learning (RL) algorithm PI 2 ( P olicy I mprovement with P ath I ntegrals). PI 2 is a model-free, sampling-based learning method derived from first principles of stochastic optimal control. The PI 2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on the cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI 2 is that it can scale to problems of many DOFs, so that reinforcement learning on real robotic systems becomes feasible. We sketch the PI 2 algorithm and its theoretical properties, and how it is applied to gain scheduling for variable impedance control. We evaluate our approach by presenting results on several simulated and real robots. We consider tasks involving accurate tracking through via points, and manipulation tasks requiring physical contact with the environment. In these tasks, the optimal strategy requires both tuning of a reference trajectory and the impedance of the end-effector. The results show that we can use path integral based reinforcement learning not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.

[1]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[2]  Richard A. Tapia,et al.  Practical Methods of Optimization, Volume 2: Constrained Optimization (R. Fletcher) , 1984 .

[3]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation: Part I—Theory , 1985 .

[4]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation: Part II—Implementation , 1985 .

[5]  S. Yakowitz The stagewise Kuhn-Tucker condition and differential dynamic programming , 1986 .

[6]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[7]  R. Fletcher Practical Methods of Optimization , 1988 .

[8]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[9]  Oussama Khatib,et al.  Inertial Properties in Robotic Manipulation: An Object-Level Framework , 1995, Int. J. Robotics Res..

[10]  T. Basar,et al.  H∞-0ptimal Control and Related Minimax Design Problems: A Dynamic Game Approach , 1996, IEEE Trans. Autom. Control..

[11]  J. Yong Relations among ODEs, PDEs, FSDEs, BSDEs, and FBSDEs , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[12]  T. Vincent,et al.  Nonlinear and Optimal Control Systems , 1997 .

[13]  R. Caflisch Monte Carlo and quasi-Monte Carlo methods , 1998, Acta Numerica.

[14]  Vijay Kumar,et al.  On the generation of smooth three-dimensional rigid body motions , 1998, IEEE Trans. Robotics Autom..

[15]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[16]  Bruno Siciliano,et al.  Robot Force Control , 2000 .

[17]  Jon Rigelsford,et al.  Modelling and Control of Robot Manipulators , 2000 .

[18]  L. Siciliano Modelling and Control of Robot Manipulators , 2000 .

[19]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[20]  Jun Morimoto,et al.  Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[21]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[22]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[23]  Iven M. Y. Mareels,et al.  Stability and motor adaptation in human arm movements , 2005, Biological Cybernetics.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  H. Kappen Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[26]  Jun Morimoto,et al.  CB: A Humanoid Research Platform for Exploring NeuroScience , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[27]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[28]  Neville Hogan,et al.  Force Control with A Muscle-Activated Endoskeleton , 2006 .

[29]  S. Kawamura,et al.  Force Control with A Muscle-Activated Endoskeleton , 2007 .

[30]  William D. Smart,et al.  Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[31]  G. Lantoine,et al.  A Hybrid Differential Dynamic Programming Algorithm for Robust Low-Thrust Optimization , 2008 .

[32]  Emanuel Todorov,et al.  General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[33]  Hilbert J. Kappen,et al.  Graphical Model Inference in Optimal Control of Stochastic Multi-Agent Systems , 2008, J. Artif. Intell. Res..

[34]  G. Oriolo,et al.  Robotics: Modelling, Planning and Control , 2008 .

[35]  Jan Peters,et al.  Machine Learning for motor skills in robotics , 2008, Künstliche Intell..

[36]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[37]  L. Selen,et al.  Impedance Control Reduces Instability That Arises from Motor Noise , 2009, The Journal of Neuroscience.

[38]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[39]  Stefan Schaal,et al.  Compliant quadruped locomotion over rough terrain , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[41]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  Stefan Schaal,et al.  Reinforcement learning of full-body humanoid motor skills , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[43]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[44]  Stefan Schaal,et al.  Variable Impedance Control - A Reinforcement Learning Approach , 2010, Robotics: Science and Systems.