Control, Planning, Learning, and Imitation with Dynamic Movement Primitives

In both human and humanoid movement science, the topic of movement primitives has become central in understanding the generation of complex motion with high degree-of-freedom bodies. A theory of control, planning, learning, and imitation with movement primitives seems to be crucial in order to reduce the search space during motor learning and achieve a large level of autonomy and flexibility in dynamically changing environments. Movement recognition based on the same representations as used for movement generation, i.e., movement primitives, is equally intimately tied into these research questions. This paper discusses a comprehensive framework for motor control with movement primitives using a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic movement plans. Model-based control theory is used to convert such movement plans into motor commands. By means of coupling terms, on-line modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution — indeed, DMPs form complete kinematic control policies, not just a particular desired trajectory. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstrations. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria, including situations with delayed rewards. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB and an application of learning simulated biped walking from a demonstrated trajectory, including self-improvement of the movement patterns in the spirit of energy efficiency through resonance tuning. 2 S.SCHAAL, J.PETERS, J. NAKANISHI and A.IJSPEERT

[1]  M. Ciletti,et al.  The computation and theory of optimal control , 1972 .

[2]  Michael A. Arbib,et al.  Perceptual Structures and Distributed Motor Control , 1981 .

[3]  J. Hollerbach Dynamic Scaling of Manipulator Trajectories , 1983, 1983 American Control Conference.

[4]  P. Kumar,et al.  Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.

[5]  Paolo Viviani,et al.  Do Units of Motor Action Really Exist , 1986 .

[6]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[7]  E. A. Jackson,et al.  Perspectives of nonlinear dynamics , 1990 .

[8]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[9]  Phillip J. McKerrow,et al.  Introduction to robotics , 1991 .

[10]  Gerald Tesauro,et al.  Temporal Difference Learning of Backgammon Strategy , 1992, ML Workshop.

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  F A Mussa-Ivaldi,et al.  Adaptive representation of dynamics during learning of a motor task , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[13]  Daniel E. Koditschek,et al.  Further progress in robot juggling: solvable mirror laws , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[14]  Bruno Siciliano,et al.  Modeling and Control of Robot Manipulators , 1995 .

[15]  M. Turvey,et al.  Diffusive, Synaptic, and Synergetic Coupling: An Evaluation Through In-Phase and Antiphase Rhythmic Movements. , 1996, Journal of motor behavior.

[16]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[17]  S. Schaal,et al.  One-Handed Juggling: A Dynamical Approach to a Rhythmic Movement Task. , 1996, Journal of motor behavior.

[18]  Mitsuo Kawato,et al.  A tennis serve and upswing learning robot based on bi-directional theory , 1998, Neural Networks.

[19]  S. Schaal,et al.  Programmable Pattern Generators , 1998 .

[20]  M. Matarić Behavior-based robotics as a tool for synthesis of artificial behavior and analysis of natural behavior , 1998, Trends in Cognitive Sciences.

[21]  Matthew M. Williamson,et al.  Neural control of rhythmic arm movements , 1998, Neural Networks.

[22]  Daniel E. Koditschek,et al.  Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..

[23]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[24]  S. Schaal,et al.  Segmentation of endpoint trajectories does not imply segmented control , 1999, Experimental Brain Research.

[25]  Winfried Stefan Lohmiller,et al.  Contraction analysis of nonlinear systems , 1999 .

[26]  K. Chang,et al.  Robots in human environments , 1999, Proceedings of the First Workshop on Robot Motion and Control. RoMoCo'99 (Cat. No.99EX353).

[27]  Shun-ichi Amari,et al.  Natural Gradient Learning for Over- and Under-Complete Bases in ICA , 1999, Neural Computation.

[28]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[29]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[30]  Maja J. Mataric,et al.  Getting Humanoids to Move and Imitate , 2000, IEEE Intell. Syst..

[31]  Sebastian Thrun,et al.  Probabilistic Algorithms in Robotics , 2000, AI Mag..

[32]  Jun Nakanishi,et al.  A brachiating robot controller , 2000, IEEE Trans. Robotics Autom..

[33]  Joshua G. Hale,et al.  Using Humanoid Robots to Study Human Behavior , 2000, IEEE Intell. Syst..

[34]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[35]  Jochen Triesch,et al.  Democratic Integration: Self-Organized Integration of Adaptive Cues , 2001, Neural Computation.

[36]  Jun Nakanishi,et al.  Trajectory formation for imitation with nonlinear dynamical systems , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[37]  Stefan Schaal,et al.  Biomimetic Oculomotor Control , 2001, Adapt. Behav..

[38]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[39]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[40]  Stefan Schaal,et al.  Learning Robot Control , 2002 .

[41]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[42]  Jun Nakanishi,et al.  Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  M. Sakaguchi Robo sapiens : Evolution of a New Species , 2002 .

[44]  Yoshihiko Nakamura,et al.  Acquisition and embodiment of motion elements in closed mimesis loop , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[45]  Stefan Schaal,et al.  Arm and Hand Movement Control , 2002 .

[46]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[47]  Stefan Schaal,et al.  Computational approaches to motor learning by imitation. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[48]  Katsu Yamane,et al.  Dynamics Filter - concept and implementation of online motion Generator for human figures , 2000, IEEE Trans. Robotics Autom..

[49]  Andrew Blake,et al.  Mathematical modelling of animate and intentional motion. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[50]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[51]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[52]  Jun Nakanishi,et al.  Composite adaptive control with locally weighted statistical learning , 2005, Neural Networks.

[53]  Jean-Arcady Meyer,et al.  Adaptive Behavior , 2005 .

[54]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.