Imitation and Reinforcement Learning for Motor Primitives with Perceptual Coupling

Traditional motor primitive approaches deal largely with open-loop policies which can only deal with small perturbations. In this paper, we present a new type of motor primitive policies which serve as closed-loop policies together with an appropriate learning algorithm. Our new motor primitives are an augmented version version of the dynamical system-based motor primitives [Ijspeert et al(2002)Ijspeert, Nakanishi, and Schaal] that incorporates perceptual coupling to external variables. We show that these motor primitives can perform complex tasks such as Ball-in-a-Cup or Kendama task even with large variances in the initial conditions where a skilled human player would be challenged. We initialize the open-loop policies by imitation learning and the perceptual coupling with a handcrafted solution. We first improve the open-loop policies and subsequently the perceptual coupling using a novel reinforcement learning method which is particularly well-suited for dynamical system-based motor primitives.

[1]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[2]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[3]  Yasuhiro Masutani,et al.  Mastering of a Task with Interaction between a Robot and Its Environment. "Kendama" Task. , 1993 .

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  S. Schaal,et al.  A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.

[6]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[7]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[8]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[9]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[10]  Jun Nakanishi,et al.  Control, Planning, Learning, and Imitation with Dynamic Movement Primitives , 2003 .

[11]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[12]  Alin Albu-Schäffer,et al.  Learning from demonstration: repetitive movements for autonomous service robotics , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[13]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[14]  Jun Morimoto,et al.  A framework for learning biped locomotion with dynamical movement primitives , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[15]  Stefan Schaal,et al.  Rapid synchronization and accurate phase-locking of rhythmic motor primitives , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[18]  Jun Nakanishi,et al.  Experimental Evaluation of Task Space Position/Orientation Control Towards Compliant Control for Humanoid Robots , 2007 .

[19]  Stefan Schaal,et al.  Reinforcement Learning for Operational Space Control , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[20]  G. Wulf,et al.  Attention and Motor Skill Learning , 2007 .

[21]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[22]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[23]  Dana Kulic,et al.  Incremental learning of full body motion primitives for humanoid robots , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[24]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[25]  Sethu Vijayakumar,et al.  A novel method for learning policies from variable constraint data , 2009, Auton. Robots.

[26]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[27]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[28]  Sethu Vijayakumar,et al.  Methods for Learning Control Policies from Variable-Constraint Demonstrations , 2010, From Motor Learning to Interaction Learning in Robots.

[29]  Dana Kulic,et al.  Incremental Learning of Full Body Motion Primitives , 2010, From Motor Learning to Interaction Learning in Robots.

[30]  Olivier Sigaud,et al.  From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.