Nonamemanuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to

Humans manage to adapt learned movements very quickly to new situations by generalizing learned behaviors from similar situations. In contrast, robots currently often need to re-learn the complete movement. In this paper, we propose a method that learns to generalize parametrized motor plans by adapting a small set of global parameters, called meta-parameters. We employ reinforcement learning to learn the required meta-parameters to deal with the current situation, described by states. We introduce an appropriate reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. To show its feasibility, we evaluate this algorithm on a toy example and compare it to several previous approaches. Subsequently, we apply the approach to three robot tasks, i.e., the generalization of throwing movements in darts, of hitting movements in table tennis, and of throwing balls where the tasks are learned on several different real physical robots, i.e., a Barrett WAM, a BioRob, the JST-ICORP/SARCOS CBi and a Kuka KR 6.

[1]  L. N. Kanal,et al.  Uncertainty in Artificial Intelligence 5 , 1990 .

[2]  Richard A. Schmidt,et al.  Motor Learning and Performance , 1991 .

[3]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[4]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[5]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[6]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[7]  R. Grupen Learning Robot Control - Using Control Policies as Abstract Actions , 1998 .

[8]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[9]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[10]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[11]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[12]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[13]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[14]  Rodney A. Brooks,et al.  Humanoid robots , 2002, CACM.

[15]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[16]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[17]  Noah J. Cowan,et al.  Efficient Gradient Estimation for Motor Control Learning , 2002, UAI.

[18]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[19]  Alin Albu-Schäffer,et al.  Learning from demonstration: repetitive movements for autonomous service robotics , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[20]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[21]  Gordon Cheng,et al.  Learning to Act from Observation and Practice , 2004, Int. J. Humanoid Robotics.

[22]  Stefan Schaal,et al.  Rapid synchronization and accurate phase-locking of rhythmic motor primitives , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[24]  Jun Morimoto,et al.  CB: A Humanoid Research Platform for Exploring NeuroScience , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[25]  Daniel M Wolpert,et al.  Computational principles of sensorimotor control that minimize uncertainty and variability , 2007, The Journal of physiology.

[26]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[27]  G. Wulf,et al.  Attention and Motor Skill Learning , 2007 .

[28]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[29]  Rajesh P. N. Rao,et al.  Learning nonparametric policies by imitation , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Stefan Schaal,et al.  Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[31]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[32]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[33]  Arie Yeredor,et al.  The Kalman Filter , 2008 .

[34]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[35]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[36]  Marc Toussaint,et al.  Trajectory prediction: learning to map situations to robot trajectories , 2009, ICML '09.

[37]  Christoph H. Lampert,et al.  Movement templates for learning of hitting and batting , 2010, 2010 IEEE International Conference on Robotics and Automation.

[38]  Oskar von Stryk,et al.  BioRob-Arm: A Quickly Deployable and Intrinsically Safe, Light- Weight Robot Arm for Service Robotics Applications , 2010, ISR/ROBOTIK.

[39]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[40]  Wolfram Burgard,et al.  Robotics: Science and Systems XV , 2010 .

[41]  Jan Peters,et al.  A biomimetic approach to robot table tennis , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  Jun Morimoto,et al.  Task-Specific Generalization of Discrete and Periodic Dynamic Movement Primitives , 2010, IEEE Transactions on Robotics.

[43]  L. Bailey The Kalman Filter , 2010 .

[44]  Jan Peters,et al.  Learning table tennis with a Mixture of Motor Primitives , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[45]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[46]  Klas Kronander,et al.  Learning to control planar hitting motions in a minigolf-like task , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[47]  Gerd Hirzinger,et al.  Trajectory planning for optimal robot catching in real-time , 2011, 2011 IEEE International Conference on Robotics and Automation.

[48]  Jan Peters,et al.  Learning elementary movements jointly with a higher level task , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.