论文信息 - Imitation and Reinforcement Learning for Motor Primitives with Perceptual Coupling

Imitation and Reinforcement Learning for Motor Primitives with Perceptual Coupling

Traditional motor primitive approaches deal largely with open-loop policies which can only deal with small perturbations. In this paper, we present a new type of motor primitive policies which serve as closed-loop policies together with an appropriate learning algorithm. Our new motor primitives are an augmented version version of the dynamical system-based motor primitives [Ijspeert et al(2002)Ijspeert, Nakanishi, and Schaal] that incorporates perceptual coupling to external variables. We show that these motor primitives can perform complex tasks such as Ball-in-a-Cup or Kendama task even with large variances in the initial conditions where a skilled human player would be challenged. We initialize the open-loop policies by imitation learning and the perceptual coupling with a handcrafted solution. We first improve the open-loop policies and subsequently the perceptual coupling using a novel reinforcement learning method which is particularly well-suited for dynamical system-based motor primitives.

[1] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[2] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[3] Yasuhiro Masutani,et al. Mastering of a Task with Interaction between a Robot and Its Environment. "Kendama" Task. , 1993 .

[4] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5] S. Schaal,et al. A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.

[6] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[7] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .

[8] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[9] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[10] Jun Nakanishi,et al. Control, Planning, Learning, and Imitation with Dynamic Movement Primitives , 2003 .

[11] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[12] Alin Albu-Schäffer,et al. Learning from demonstration: repetitive movements for autonomous service robotics , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[13] Jun Morimoto,et al. Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[14] Jun Morimoto,et al. A framework for learning biped locomotion with dynamical movement primitives , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[15] Stefan Schaal,et al. Rapid synchronization and accurate phase-locking of rhythmic motor primitives , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17] Aude Billard,et al. Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[18] Jun Nakanishi,et al. Experimental Evaluation of Task Space Position/Orientation Control Towards Compliant Control for Humanoid Robots , 2007 .

[19] Stefan Schaal,et al. Reinforcement Learning for Operational Space Control , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[20] G. Wulf,et al. Attention and Motor Skill Learning , 2007 .

[21] Stefan Schaal,et al. Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[22] Jürgen Schmidhuber,et al. State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[23] Dana Kulic,et al. Incremental learning of full body motion primitives for humanoid robots , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[24] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[25] Sethu Vijayakumar,et al. A novel method for learning policies from variable constraint data , 2009, Auton. Robots.

[26] David Silver,et al. Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[27] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.

[28] Sethu Vijayakumar,et al. Methods for Learning Control Policies from Variable-Constraint Demonstrations , 2010, From Motor Learning to Interaction Learning in Robots.

[29] Dana Kulic,et al. Incremental Learning of Full Body Motion Primitives , 2010, From Motor Learning to Interaction Learning in Robots.

[30] Olivier Sigaud,et al. From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.