Machine Learning for motor skills in robotics

Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and the cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks of future robots. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator and humanoid robotics and usually scaling was only achieved in precisely pre-structured domains. We have investigated the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  Robert E. Kalaba,et al.  Dynamic Programming and Modern Control Theory , 1966 .

[3]  V. Hutson Integral Equations , 1967, Nature.

[4]  Richard Bellman,et al.  Introduction to the mathematical theory of control processes , 1967 .

[5]  N. Minamide Minimum error control problem in Banach space , 1969 .

[6]  L. Goddard,et al.  Operations Research (OR) , 2007 .

[7]  L. Weiss Introduction to the mathematical theory of control processes, Vol. I - Linear equations and quadratic criteria , 1970 .

[8]  Stephen R. McReynolds,et al.  The computation and theory of optimal control , 1970 .

[9]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[10]  W. Müller JACOBSON, D. H. and D. Q. MAYNE: Differential dynamic programming. Modern analytic and computational methods in Science and Mathematics, No. 24. American Elsevier Publ. Co., Inc., New York 1970. XVI, 208 S., 17 Abb., Dfl. 51.50. , 1973 .

[11]  L. Hasdorff Gradient Optimization and Nonlinear Control , 1976 .

[12]  George M. Siouris,et al.  Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Michael A. Arbib,et al.  Perceptual Structures and Distributed Motor Control , 1981 .

[14]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Russell H. Taylor,et al.  Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[16]  Mark W. Spong,et al.  ON POINTWISE OPTIMAL CONTROL STRATEGIES FOR ROBOT MANIPULATORS. , 1984 .

[17]  A. A. Maciejewski,et al.  Obstacle Avoidance , 2005 .

[18]  A. Isidori Nonlinear Control Systems , 1985 .

[19]  Mark W. Spong,et al.  The control of robot manipulators with bounded input , 1986 .

[20]  T. Yoshikawa,et al.  Task-Priority Based Redundancy Control of Robot Manipulators , 1987 .

[21]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[22]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[23]  P. Ramadge,et al.  Supervisory control of a class of discrete event processes , 1987 .

[24]  John M. Hollerbach,et al.  Local versus global torque optimization of redundant manipulators , 1987, Proceedings. 1987 IEEE International Conference on Robotics and Automation.

[25]  John M. Hollerbach,et al.  Redundancy resolution of manipulators through torque optimization , 1987, IEEE J. Robotics Autom..

[26]  R. Fletcher Practical Methods of Optimization , 1988 .

[27]  Zhaoyu Wang,et al.  Global versus Local Optimization in Redundancy Resolution of Robotic Manipulators , 1988, Int. J. Robotics Res..

[28]  A. Guez,et al.  Solution to the inverse kinematics problem in robotics by neural networks , 1988, IEEE 1988 International Conference on Neural Networks.

[29]  Christopher G. Atkeson,et al.  Model-Based Control of a Robot Manipulator , 1988 .

[30]  Hossein Arsham,et al.  Sensitivity analysis and the “what if” problem in simulation analysis , 1989 .

[31]  Peter W. Glynn,et al.  Optimization Of Stochastic Systems Via Simulation , 1989, 1989 Winter Simulation Conference Proceedings.

[32]  S. Shankar Sastry,et al.  Dynamic control of redundant manipulators , 1989, J. Field Robotics.

[33]  Yoshihiko Nakamura,et al.  Advanced robotics - redundancy and optimization , 1990 .

[34]  Tsuneo Yoshikawa,et al.  Foundations of Robotics: Analysis and Control , 1990 .

[35]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[36]  Jean-Jacques E. Slotine,et al.  A general framework for managing multiple tasks in highly redundant robotic systems , 1991, Fifth International Conference on Advanced Robotics 'Robots in Unstructured Environments.

[37]  Alessandro De Luca,et al.  Learning control for redundant manipulators , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[38]  Claude Samson,et al.  Robot Control: The Task Function Approach , 1991 .

[39]  Oliver G. Selfridge,et al.  Real-time learning: a ball on a beam , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[40]  Vijaykumar Gullapalli,et al.  Learning Control Under Extreme Uncertainty , 1992, NIPS.

[41]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[42]  Keith L. Doty,et al.  A Theory of Generalized Inverses Applied to Robotics , 1993, Int. J. Robotics Res..

[43]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[44]  Dana H. Ballard,et al.  Recognizing teleoperated manipulations , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[45]  M. Kawato,et al.  Trajectory formation of arm movement by a neural network with forward and inverse dynamics models , 1993 .

[46]  S. Grossberg,et al.  A Self-Organizing Neural Model of Motor Equivalent Reaching and Tool Use by a Multijoint Arm , 1993, Journal of Cognitive Neuroscience.

[47]  Won Jee Chung,et al.  Null torque-based dynamic control for kinematically redundant manipulators , 1993, J. Field Robotics.

[48]  Mitsuo Kawato,et al.  Teaching by Showing in Kendama Based on Optimization Principle , 1994 .

[49]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[50]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[51]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[52]  Vijaykumar Gullapalli,et al.  Skillful control under uncertainty via direct reinforcement learning , 1995, Robotics Auton. Syst..

[53]  Bruno Siciliano,et al.  Modeling and Control of Robot Manipulators , 1995 .

[54]  Michael I. Jordan,et al.  Reinforcement Learning by Probability Matching , 1995, NIPS 1995.

[55]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[56]  M. Branicky,et al.  Algorithms for optimal hybrid control , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[57]  Stefan Schaal,et al.  A Kendama learning robot based on a dynamic optimization theory , 1995, Proceedings 4th IEEE International Workshop on Robot and Human Communication.

[58]  Nancy A. Lynch,et al.  Hybrid I/O automata , 1995, Inf. Comput..

[59]  Thomas A. Henzinger,et al.  Linear Phase-Portrait Approximations for Nonlinear Hybrid Systems , 1996, Hybrid Systems.

[60]  Jonghoon Park,et al.  Specification and control of motion for kinematically redundant manipulators , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[61]  R. Kalaba,et al.  Analytical Dynamics: A New Approach , 1996 .

[62]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[63]  Suguru Arimoto,et al.  Control Theory of Nonlinear Mechanical Systems , 1996 .

[64]  S. Schaal,et al.  A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.

[65]  Stephen S. Wilson,et al.  Random iterative models , 1996 .

[66]  Carlos Canudas de Wit,et al.  Theory of Robot Control , 1996 .

[67]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[68]  Thomas A. Henzinger,et al.  The theory of hybrid automata , 1996, Proceedings 11th Annual IEEE Symposium on Logic in Computer Science.

[69]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[70]  Yuan F. Zheng,et al.  Reinforcement learning for a biped robot to climb sloping surfaces , 1997, J. Field Robotics.

[71]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[72]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[73]  J. Spall,et al.  Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation , 1997, Proceedings of the 1997 American Control Conference (Cat. No.97CH36041).

[74]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[75]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[76]  Vijay Balasubramanian,et al.  Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions , 1996, Neural Computation.

[77]  Anders Rantzer,et al.  Computation of piecewise quadratic Lyapunov functions for hybrid systems , 1997, 1997 European Control Conference (ECC).

[78]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[79]  Mitsuo Kawato,et al.  A tennis serve and upswing learning robot based on bi-directional theory , 1998, Neural Networks.

[80]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[81]  Mitsuo Kawato,et al.  Multiple Paired Forward-Inverse Models for Human Motor Learning and Control , 1998, NIPS.

[82]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[83]  Jerry E. Pratt,et al.  Intuitive control of a planar bipedal walking robot , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[84]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[85]  P. Caines,et al.  Hierarchical hybrid control systems: a lattice theoretic formulation , 1998, IEEE Trans. Autom. Control..

[86]  Randal W. Bea Successive Galerkin approximation algorithms for nonlinear optimal and robust control , 1998 .

[87]  M. Branicky Multiple Lyapunov functions and other analysis tools for switched and hybrid systems , 1998, IEEE Trans. Autom. Control..

[88]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[89]  Daniel E. Koditschek,et al.  Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..

[90]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[91]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[92]  J. Tsitsiklis,et al.  Simulation-based optimization of Markov reward processes: implementation issues , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[93]  S. Schaal,et al.  Segmentation of endpoint trajectories does not imply segmented control , 1999, Experimental Brain Research.

[94]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[95]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[96]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[97]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[98]  J. Spall,et al.  Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers , 1999 .

[99]  D. Boussaoud,et al.  Gaze effects in the cerebral cortex: reference frames for space coding and action , 1999, Experimental Brain Research.

[100]  Michael D. Lemmon,et al.  Supervisory hybrid systems , 1999 .

[101]  K. Edström Switched Bond Graphs : Simulation and Analysis , 1999 .

[102]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[103]  A. Wing,et al.  Motor control: Mechanisms of motor equivalence in handwriting , 2000, Current Biology.

[104]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[105]  Changjiu Zhou,et al.  Reinforcement learning with fuzzy evaluative feedback for a biped robot , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[106]  Martin Buss,et al.  Towards Hybrid Optimal Control , 2000 .

[107]  William Leithead,et al.  Survey of gain-scheduling analysis and design , 2000 .

[108]  Oskar von Stryk,et al.  Towards optimal hybrid control solutions for gait patterns of a quadruped , 2000 .

[109]  O. V. Stryk,et al.  Decomposition of Mixed-Integer Optimal Control Problems Using Branch and Bound and Sparse Direct Collocation , 2000 .

[110]  Oussama Khatib,et al.  Gauss' principle and the dynamics of redundant and constrained manipulators , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[111]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[112]  Stefan Schaal,et al.  Inverse kinematics for humanoid robots , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[113]  J. Baxter,et al.  Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[114]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[115]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[116]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[117]  Lex Weaver,et al.  The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[118]  Sham M. Kakade,et al.  Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.

[119]  Peter L. Bartlett,et al.  Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[120]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[121]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[122]  A. Berny,et al.  Statistical machine learning and combinatorial optimization , 2001 .

[123]  Richard S. Sutton,et al.  Comparing Policy-Gradient Algorithms , 2001 .

[124]  Andrew G. Barto,et al.  Lyapunov-Constrained Action Sets for Reinforcement Learning , 2001, ICML.

[125]  Jun Nakanishi,et al.  Trajectory formation for imitation with nonlinear dynamical systems , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[126]  Aude Billard,et al.  Learning human arm movements by imitation: : Evaluation of a biologically inspired connectionist architecture , 2000, Robotics Auton. Syst..

[127]  Shin Ishii,et al.  Reinforcement Learning for Biped Locomotion , 2002, ICANN.

[128]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[129]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[130]  Peter L. Bartlett,et al.  An Introduction to Reinforcement Learning Theory: Value Function Methods , 2002, Machine Learning Summer School.

[131]  Jessica K. Hodgins,et al.  Generalizing Demonstrated Manipulation Tasks , 2002, WAFR.

[132]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[133]  A. Albu-Schäffer Regelung von Robotern mit elastischen Gelenken am Beispiel der DLR-Leichtbauarme , 2002 .

[134]  Jonghoon Park,et al.  Characterization of instability of dynamic control for kinematically redundant manipulators , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[135]  R. Amit,et al.  Learning movement sequences from demonstration , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.

[136]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[137]  Jun Nakanishi,et al.  Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[138]  Ralf Schoknecht,et al.  Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.

[139]  Alin Albu-Schäffer,et al.  DLR's torque-controlled light weight robot III-are we reaching the technological limits now? , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[140]  Stefan Schaal,et al.  Computational elements of robot learning by imitation , 2002 .

[141]  Yoshihiko Nakamura,et al.  Acquisition and embodiment of motion elements in closed mimesis loop , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[142]  S. Żak Systems and control , 2002 .

[143]  Panos J. Antsaklis,et al.  An approach to optimal control of switched systems with internally forced switchings , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[144]  Jun Morimoto,et al.  Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[145]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[146]  Jun Morimoto,et al.  Robust low-torque biped walking using differential dynamic programming with a minimax criterion , 2002 .

[147]  John N. Tsitsiklis,et al.  Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..

[148]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[149]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[150]  Jun Nakanishi,et al.  Control, Planning, Learning, and Imitation with Dynamic Movement Primitives , 2003 .

[151]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[152]  Jeff G. Schneider,et al.  Covariant policy search , 2003, IJCAI 2003.

[153]  Stefan Schaal,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[154]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[155]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[156]  Sethu Vijayakumar,et al.  Scaling Reinforcement Learning Paradigms for Motor Learning , 2003 .

[157]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[158]  Noah J. Cowan,et al.  Efficient Gradient Estimation for Motor Control Learning , 2002, UAI.

[159]  Katsu Yamane,et al.  Natural Motion Animation through Constraining and Deconstraining at Will , 2003, IEEE Trans. Vis. Comput. Graph..

[160]  F. Udwadia A new perspective on the tracking control of nonlinear structural and mechanical systems , 2003, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[161]  O. Khatib TASK-ORIENTED CONTROL OF HUMANOID ROBOTS THROUGH PRIORITIZATION , 2004 .

[162]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[163]  Peggy Fidelman,et al.  Learning Ball Acquisition on a Physical Robot , 2004 .

[164]  Shin Ishii,et al.  Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.

[165]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[166]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[167]  Nancy S. Pollard,et al.  Closure and Quality Equivalence for Efficient Synthesis of Grasps from Examples , 2004, Int. J. Robotics Res..

[168]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[169]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[170]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[171]  Jun Tani,et al.  Motor primitive and sequence self-organization in a hierarchical recurrent neural network , 2004, Neural Networks.

[172]  Jun Nakanishi,et al.  Learning composite adaptive control for a class of nonlinear systems , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[173]  Shin Ishii,et al.  Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot , 2004, PPSN.

[174]  Yoshihiko Nakamura,et al.  Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..

[175]  Oussama Khatib,et al.  Whole-Body Dynamic Behavior and Control of Human-like Robots , 2004, Int. J. Humanoid Robotics.

[176]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[177]  Mitsuo Kawato,et al.  A theory for cursive handwriting based on the minimization principle , 1995, Biological Cybernetics.

[178]  Oussama Khatib,et al.  Prioritized multi-objective dynamics and control of robots in human environments , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[179]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[180]  Jun Nakanishi,et al.  A unifying methodology for the control of robotic systems , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[181]  Oussama Khatib,et al.  Control of Free-Floating Humanoid Robots Through Task Prioritization , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[182]  Takayuki Kanda,et al.  Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[183]  A. D. Lewis,et al.  Geometric control of mechanical systems : modeling, analysis, and design for simple mechanical control systems , 2005 .

[184]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[185]  Stefan Schaal,et al.  A New Methodology for Robot Controller Design , 2005 .

[186]  Jun Nakanishi,et al.  Comparative experiments on task space control with redundancy resolution , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[187]  Jongho Kim,et al.  An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm , 2005, CIS.

[188]  Jun Morimoto,et al.  Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.

[189]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[190]  Douglas Aberdeen,et al.  POMDPs and Policy Gradients , 2006 .

[191]  Stefan Schaal,et al.  Learning Operational Space Control , 2006, Robotics: Science and Systems.

[192]  Jin Yu,et al.  Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.

[193]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[194]  Shin Ishii,et al.  Fast and Stable Learning of Quasi-Passive Dynamic Walking by an Unstable Biped Robot based on Off-Policy Natural Actor-Critic , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[195]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[196]  J. Peters,et al.  Using Reward-weighted Regression for Reinforcement Learning of Task Space Control , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[197]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[198]  Stefan Schaal,et al.  Reinforcement Learning for Operational Space Control , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[199]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .