Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments

For humans and robots, variable impedance control is an essential component for ensuring robust and safe physical interaction with the environment. Humans learn to adapt their impedance to specific tasks and environments; a capability which we continually develop and improve until we are well into our twenties. In this article, we reproduce functionally interesting aspects of learning impedance control in humans on a simulated robot platform. As demonstrated in numerous force field tasks, humans combine two strategies to adapt their impedance to perturbations, thereby minimizing position error and energy consumption: 1) if perturbations are unpredictable, subjects increase their impedance through cocontraction; and 2) if perturbations are predictable, subjects learn a feed-forward command to offset the perturbation. We show how a 7-DOF simulated robot demonstrates similar behavior with our model-free reinforcement learning algorithm PI2, by applying deterministic and stochastic force fields to the robot's end-effector. We show the qualitative similarity between the robot and human movements. Our results provide a biologically plausible approach to learning appropriate impedances purely from experience, without requiring a model of either body or environment dynamics. Not requiring models also facilitates autonomous development for robots, as prespecified models cannot be provided for each environment a robot might encounter.

[1]  Neville Hogan,et al.  Impedance control - An approach to manipulation. I - Theory. II - Implementation. III - Applications , 1985 .

[2]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation: Part I—Theory , 1985 .

[3]  F A Mussa-Ivaldi,et al.  Adaptive representation of dynamics during learning of a motor task , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[4]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[5]  R A Scheidt,et al.  Impedance control and internal model formation when reaching in a randomly varying dynamical environment. , 2001, Journal of neurophysiology.

[6]  R A Scheidt,et al.  Learning to move amid uncertainty. , 2001, Journal of neurophysiology.

[7]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[8]  Toshiyuki Kondo,et al.  Biological robot arm motion through reinforcement learning , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[9]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[10]  Toshiyuki Kondo,et al.  Biological arm motion through reinforcement learning , 2004, Biological Cybernetics.

[11]  Michael I. Jordan,et al.  Are arm trajectories planned in kinematic or dynamic coordinates? An adaptation study , 1995, Experimental Brain Research.

[12]  H. Kappen Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[13]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[14]  Rieko Osu,et al.  CNS Learns Stable, Accurate, and Efficient Movements Using a Simple Algorithm , 2008, The Journal of Neuroscience.

[15]  Reza Shadmehr,et al.  Motor Adaptation as a Process of Reoptimization , 2008, The Journal of Neuroscience.

[16]  G. Oriolo,et al.  Robotics: Modelling, Planning and Control , 2008 .

[17]  L. Selen,et al.  Impedance Control Reduces Instability That Arises from Motor Noise , 2009, The Journal of Neuroscience.

[18]  Sethu Vijayakumar,et al.  Exploiting sensorimotor stochasticity for learning control of variable impedance actuators , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[19]  Sungchul Kang,et al.  Impedance Learning for Robotic Contact Tasks Using Natural Actor-Critic Algorithm , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Alin Albu-Schäffer,et al.  Biomimetic motor behavior for simultaneous adaptation of force, impedance and trajectory in interaction tasks , 2010, 2010 IEEE International Conference on Robotics and Automation.

[21]  Stefan Schaal,et al.  Reinforcement learning of full-body humanoid motor skills , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[22]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[23]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[24]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[25]  Stefan Schaal,et al.  Hierarchical reinforcement learning with movement primitives , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[26]  Stefan Schaal,et al.  Reinforcement learning of impedance control in stochastic force fields , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[27]  Stefan Schaal,et al.  Learning to grasp under uncertainty , 2011, 2011 IEEE International Conference on Robotics and Automation.