Learning Motor Skills - From Algorithms to Robot Experiments

This book presents the state of the art in reinforcement learning applied to robotics both in terms of novel algorithms and applications. It discusses recent approaches that allow robots to learn motor.skills and presents tasks that need to take into account the dynamic behavior of the robot and its environment, where a kinematic movement plan is not sufficient. The book illustrates a method that learns to generalize parameterized motor plans which is obtained by imitation or reinforcement learning, by adapting a small set of global parameters and appropriate kernel-based reinforcement learning algorithms. The presented applications explore highly dynamic tasks and exhibit a very efficient learning process. All proposed approaches have been extensively validated with benchmarks tasks, in simulation and on real robots. These tasks correspond to sports and games but the presented techniques are also applicable to more mundane household tasks. The book is based on the first authors doctoral thesis, which won the 2013 EURON Georges Giralt PhD Award.

[1]  W. A. Clark,et al.  Simulation of self-organizing systems by digital computer , 1954, Trans. IRE Prof. Group Inf. Theory.

[2]  Alfred W. Hubbard,et al.  Visual Movements of Batters , 1954 .

[3]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[4]  H. Simon,et al.  The shape of automation for men and management , 1965 .

[5]  Richard Bellman,et al.  Introduction to the mathematical theory of control processes , 1967 .

[6]  L. Weiss Introduction to the mathematical theory of control processes, Vol. I - Linear equations and quadratic criteria , 1970 .

[7]  John B. Kidd,et al.  Decisions with Multiple Objectives—Preferences and Value Tradeoffs , 1977 .

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  John R. Beaumont,et al.  Control and Coordination in Hierarchical Systems , 1981 .

[11]  Suguru Arimoto,et al.  Bettering operation of Robots by learning , 1984, J. Field Robotics.

[12]  Tsuneo Yoshikawa,et al.  Manipulability of Robotic Mechanisms , 1985 .

[13]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[14]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[15]  Anil V. Rao,et al.  Practical Methods for Optimal Control Using Nonlinear Programming , 1987 .

[16]  Russell L. Anderson,et al.  A Robot Ping-Pong Player: Experiments in Real-Time Intelligent Control , 1988 .

[17]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[18]  Mitsuo Kawato,et al.  Feedback-Error-Learning Neural Network for Supervised Motor Learning , 1990 .

[19]  R. Bootsma,et al.  Timing an attacking forehand drive in table tennis. , 1990 .

[20]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[21]  John T. Wen,et al.  A robot ping pong player: optimized mechanics, high performance 3D vision, and intelligent sensor control , 1990, Robotersysteme.

[22]  Richard A. Schmidt,et al.  Motor Learning and Performance , 1991 .

[23]  R.J. Williams,et al.  Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[24]  Oliver G. Selfridge,et al.  Real-time learning: a ball on a beam , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[25]  Marco Colombetti,et al.  Robot shaping: developing situated agents through learning , 1992 .

[26]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[27]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[28]  C. Sumners Toys in Space: Exploring Science with the Astronauts , 1993 .

[29]  Yasuhiro Masutani,et al.  Mastering of a Task with Interaction between a Robot and Its Environment. "Kendama" Task. , 1993 .

[30]  F. B. Vernadat,et al.  Decisions with Multiple Objectives: Preferences and Value Tradeoffs , 1994 .

[31]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[32]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[33]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[34]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[35]  S. Schaal,et al.  Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.

[36]  Daniel E. Koditschek,et al.  Planning and Control of Robotic Juggling and Catching Tasks , 1994, Int. J. Robotics Res..

[37]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[38]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[39]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[40]  Sebastian Thrun,et al.  An approach to learning mobile robot navigation , 1995, Robotics Auton. Syst..

[41]  Inman Harvey,et al.  Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.

[42]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[43]  S. Schaal,et al.  A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.

[44]  George A. Bekey,et al.  Rapid Reinforcement Learning for Reactive Control Policy Design for Autonomous Robots , 1996 .

[45]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[46]  Jean-Arcady Meyer,et al.  Learning reactive and planning rules in a motivationally autonomous animat , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[47]  S. Schaal,et al.  One-Handed Juggling: A Dynamical Approach to a Rhythmic Movement Task. , 1996, Journal of motor behavior.

[48]  Irwin King,et al.  Performance analysis of a new updating rule for TD(/spl lambda/) learning in feedforward networks for position evaluation in Go game , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[49]  B. Pasik-Duncan,et al.  Adaptive Control , 1996, IEEE Control Systems.

[50]  Jeff G. Schneider,et al.  Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.

[51]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[52]  Atsuo Takanishi,et al.  Development of a biped walking robot having antagonistic driven joints using nonlinear spring mechanism , 1997, Proceedings of International Conference on Robotics and Automation.

[53]  Claude F. Touzet,et al.  Neural reinforcement learning for behaviour synthesis , 1997, Robotics Auton. Syst..

[54]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[55]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[56]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[57]  Christopher G. Atkeson,et al.  Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.

[58]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[59]  Doina Precup,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[60]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[61]  R. Grupen Learning Robot Control - Using Control Policies as Abstract Actions , 1998 .

[62]  Minoru Asada,et al.  Cooperative behavior acquisition in multi-mobile robots environment by reinforcement learning based on state vector estimation , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[63]  Frank Kirchner Q-learning of complex behaviours on a six-legged walking machine , 1998, Robotics Auton. Syst..

[64]  Leslie Pack Kaelbling,et al.  A Framework for Reinforcement Learning on Real Robots , 1998, AAAI/IAAI.

[65]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[66]  Gerald Sommer,et al.  Integrating symbolic knowledge in reinforcement learning , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[67]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[68]  Mark D. Pendrith Reinforcement Learning in Situated Agents: Theoretical and Practical Solutions , 1999, EWLR.

[69]  PracticalSolutionsMark D. Pendrith Reinforcement Learning in Situated Agents : Some Theoretical Problems and , 1999 .

[70]  Karsten Berns,et al.  Adaptive periodic movement control for the four legged walking machine BISAM , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[71]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[72]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[73]  Luke Fletcher,et al.  Reinforcement learning for a vision based mobile robot , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[74]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[75]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[76]  Jürgen Schmidhuber,et al.  Gradient-based Reinforcement Planning in Policy-Search Methods , 2001, ArXiv.

[77]  Shigenobu Kobayashi,et al.  Reinforcement learning of walking behavior for a four-legged robot , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[78]  Rogelio Lozano,et al.  Non-linear Control for Underactuated Mechanical Systems , 2001 .

[79]  Éric Marchand,et al.  A redundancy-based iterative approach for avoiding joint limits: application to visual servoing , 2001, IEEE Trans. Robotics Autom..

[80]  Andrew W. Moore,et al.  Direct Policy Search using Paired Statistical Tests , 2001, ICML.

[81]  Kazuaki Yamada,et al.  Emergent synthesis of motion patterns for locomotion robots , 2001, Artif. Intell. Eng..

[82]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[83]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[84]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[85]  Shin Ishii,et al.  Reinforcement Learning for Biped Locomotion , 2002, ICANN.

[86]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[87]  Xiao Huang,et al.  Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[88]  Mikael Norrlöf,et al.  An adaptive iterative learning control algorithm with experiments on an industrial robot , 2002, IEEE Trans. Robotics Autom..

[89]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[90]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[91]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[92]  Leslie Pack Kaelbling,et al.  Reinforcement Learning by Policy Search , 2002 .

[93]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[94]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[95]  Martin A. Riedmiller,et al.  Reinforcement learning on an omnidirectional mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[96]  Jun Nakanishi,et al.  Control, Planning, Learning, and Imitation with Dynamic Movement Primitives , 2003 .

[97]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[98]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[99]  Jeff G. Schneider,et al.  Covariant Policy Search , 2003, IJCAI.

[100]  Jürgen Schmidhuber,et al.  A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[101]  Gordon Cheng,et al.  Learning from Observation and from Practice Using Behavioral Primitives , 2003, ISRR.

[102]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[103]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[104]  Noah J. Cowan,et al.  Efficient Gradient Estimation for Motor Control Learning , 2002, UAI.

[105]  Darrin C. Bentivegna,et al.  Learning From Observation and Practice Using Behavioral Primitives : Marble Maze , 2004 .

[106]  Lucas Kovar,et al.  Automated methods for data-driven synthesis of realistic and controllable human motion , 2004 .

[107]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[108]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[109]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[110]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[111]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[112]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[113]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[114]  Alin Albu-Schäffer,et al.  Learning from demonstration: repetitive movements for autonomous service robotics , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[115]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[116]  A. Moore,et al.  Learning decisions: robustness, uncertainty, and approximation , 2004 .

[117]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[118]  Gordon Cheng,et al.  Learning to Act from Observation and Practice , 2004, Int. J. Humanoid Robotics.

[119]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[120]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[121]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[122]  Fumio Miyazaki,et al.  A learning approach to robotic table tennis , 2005, IEEE Transactions on Robotics.

[123]  Stefan Schaal,et al.  Rapid synchronization and accurate phase-locking of rhythmic motor primitives , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[124]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[125]  Oussama Khatib,et al.  Synthesis of Whole-Body Behaviors through Hierarchical Control of Behavioral Primitives , 2005, Int. J. Humanoid Robotics.

[126]  Takayuki Kanda,et al.  Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[127]  H. Sebastian Seung,et al.  Learning to Walk in 20 Minutes , 2005 .

[128]  H. Kappen Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[129]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[130]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[131]  Maarten Steinbuch,et al.  Learning-based identification and iterative learning control of direct-drive robots , 2005, IEEE Transactions on Control Systems Technology.

[132]  Vishal Soni,et al.  Reinforcement learning of hierarchical skills on the sony aibo robot , 2005, AAAI 2005.

[133]  Florentin Wörgötter,et al.  Fast biped walking with a reflexive controller and real-time policy searching , 2005, NIPS.

[134]  Tomás Martínez-Marín,et al.  Fast Reinforcement Learning for Vision-guided Mobile Robots , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[135]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[136]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[137]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[138]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[139]  Jun Morimoto,et al.  CB: A Humanoid Research Platform for Exploring NeuroScience , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[140]  Emanuel Todorov,et al.  Optimal Control Theory , 2006 .

[141]  H. Liu,et al.  A Heuristic Reinforcement Learning for Robot Approaching Objects , 2006, 2006 IEEE Conference on Robotics, Automation and Mechatronics.

[142]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[143]  Sven Behnke,et al.  Imitative Reinforcement Learning for Soccer Playing Robots , 2006, RoboCup.

[144]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[145]  Marc Carreras,et al.  Towards Direct Policy Search Reinforcement Learning for Robot Control , 2005, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[146]  P. McLeod,et al.  The generalized optic acceleration cancellation theory of catching. , 2006, Journal of experimental psychology. Human perception and performance.

[147]  Wolfram Burgard,et al.  Learning Relational Navigation Policies , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[148]  Robert Platt,et al.  Improving Grasp Skills Using Schema Structured Learning , 2006 .

[149]  Masatoshi Ishikawa,et al.  Ball control in high-speed batting motion using hybrid trajectory generator , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[150]  Jürgen Schmidhuber,et al.  Quasi-online reinforcement learning for robots , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[151]  Daniel M Wolpert,et al.  Computational principles of sensorimotor control that minimize uncertainty and variability , 2007, The Journal of physiology.

[152]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[153]  W. Burgard,et al.  Autonomous blimp control using model-free reinforcement learning in a continuous state and action space , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[154]  Stefan Schaal,et al.  Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.

[155]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[156]  Nando de Freitas,et al.  Bayesian Policy Learning with Trans-Dimensional MCMC , 2007, NIPS.

[157]  J. Leeds Attention and Motor Skill Learning , 2007 .

[158]  Lucas Paletta,et al.  Perception and Developmental Learning of Affordances in Autonomous Robots , 2007, KI.

[159]  Martin A. Riedmiller,et al.  Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[160]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[161]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[162]  I. Elhanany Reinforcement Learning in Sensor-Guided AIBO Robots , 2007 .

[163]  Marc Toussaint,et al.  Probabilistic inference for structured planning in robotics , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[164]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[165]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[166]  Dieter Fox,et al.  Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[167]  Richard S. Sutton,et al.  On the role of tracking in stationary environments , 2007, ICML '07.

[168]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[169]  Richard Alan Peters,et al.  Reinforcement Learning with a Supervisor for a Mobile Robot in a Real-world Environment , 2007, 2007 International Symposium on Computational Intelligence in Robotics and Automation.

[170]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[171]  Yi Gu,et al.  Space-indexed dynamic programming: learning to follow trajectories , 2008, ICML '08.

[172]  David Silver,et al.  High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2008, Robotics: Science and Systems.

[173]  Astrophysics Departm Reinforcement Learning of Behaviors in Mobile Robots Using Noisy Infrared Sensing , 2008 .

[174]  Rajesh P. N. Rao,et al.  Learning nonparametric policies by imitation , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[175]  Jun Nakanishi,et al.  Operational Space Control: A Theoretical and Empirical Comparison , 2008, Int. J. Robotics Res..

[176]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[177]  Yong Duan,et al.  Robot Navigation Based on Fuzzy RL Algorithm , 2008, ISNN.

[178]  Jan Peters,et al.  Policy Learning - A Unified Perspective with Applications in Robotics , 2008, EWRL.

[179]  Kemal Leblebicioglu,et al.  Free gait generation with reinforcement learning for a six-legged robot , 2008, Robotics Auton. Syst..

[180]  Nicholas Roy,et al.  Trajectory Optimization using Reinforcement Learning for Map Exploration , 2008, Int. J. Robotics Res..

[181]  Jan Peters,et al.  Using Bayesian Dynamical Systems for Motion Template Libraries , 2008, NIPS.

[182]  Stefan Schaal,et al.  Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[183]  Tomohiro Shibata,et al.  Policy Gradient Learning of Cooperative Interaction with a Robot Using User's Biological Signals , 2009, ICONIP.

[184]  Jun Morimoto,et al.  Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[185]  Botond Cseke,et al.  Advances in Neural Information Processing Systems 20 (NIPS 2007) , 2008 .

[186]  Betty J. Mohler,et al.  Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[187]  Kazuhiro Ohkura,et al.  A Reinforcement Learning Technique with an Adaptive Action Generator for a Multi-robot System , 2008, SAB.

[188]  Jan Peters,et al.  Machine Learning for motor skills in robotics , 2008, Künstliche Intell..

[189]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[190]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[191]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[192]  Oliver Brock,et al.  Learning to Manipulate Articulated Objects in Unstructured Environments Using a Grounded Relational Representation , 2008, Robotics: Science and Systems.

[193]  Arie Yeredor,et al.  The Kalman Filter , 2008 .

[194]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[195]  Jan Peters,et al.  Learning New Basic Movements for Robotics , 2009, AMS.

[196]  Ales Ude,et al.  Task adaptation through exploration and action sequencing , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[197]  Javier de Lope,et al.  The kNN-TD Reinforcement Learning Algorithm , 2009 .

[198]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[199]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[200]  Michel Tokic,et al.  The Crawler, A Class Room Demonstrator for Reinforcement Learning , 2009, FLAIRS.

[201]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[202]  Pieter Abbeel,et al.  Apprenticeship learning for helicopter control , 2009, CACM.

[203]  Jan Peters,et al.  Using reward-weighted imitation for robot Reinforcement Learning , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[204]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[205]  Andrew Y. Ng,et al.  Policy search via the signed derivative , 2009, Robotics: Science and Systems.

[206]  Andrej Gams,et al.  Generalization of example movements with dynamic systems , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[207]  Marc Toussaint,et al.  Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[208]  Marc Toussaint,et al.  Trajectory prediction: learning to map situations to robot trajectories , 2009, ICML '09.

[209]  Oliver Kroemer,et al.  Towards Motor Skill Learning for Robotics , 2007, ISRR.

[210]  Raffaello D'Andrea,et al.  Bouncing an Unconstrained Ball in Three Dimensions with a Blind Juggling Robot , 2009, 2009 IEEE International Conference on Robotics and Automation.

[211]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[212]  Oliver Kroemer,et al.  Active learning using mean shift optimization for robot grasping , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[213]  Jan Peters,et al.  A Computational Model of Human Table Tennis for Robot Application , 2009, AMS.

[214]  H. JoséAntonioMartín,et al.  The kNN-TD Reinforcement Learning Algorithm , 2009, IWINAC.

[215]  Sethu Vijayakumar,et al.  Using dimensionality reduction to exploit constraints in reinforcement learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[216]  Maren Bennewitz,et al.  Learning reliable and efficient navigation with a humanoid , 2010, 2010 IEEE International Conference on Robotics and Automation.

[217]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[218]  Pieter Abbeel,et al.  Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations , 2010, 2010 IEEE International Conference on Robotics and Automation.

[219]  Christoph H. Lampert,et al.  Movement templates for learning of hitting and batting , 2010, 2010 IEEE International Conference on Robotics and Automation.

[220]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[221]  Stefan Schaal,et al.  Algorithmen zum Automatischen Erlernen von Motorfähigkeiten , 2010 .

[222]  J. Peters,et al.  Imitation and Reinforcement Learning â Practical Algorithms for Motor Primitive Learning in Robotics , 2010 .

[223]  Ian R. Manchester,et al.  LQR-trees: Feedback Motion Planning via Sums-of-Squares Verification , 2010, Int. J. Robotics Res..

[224]  Jan Peters,et al.  Imitation and Reinforcement Learning: Practical Algorithms for Motor Primitives in Robotics , 2010 .

[225]  Oskar von Stryk,et al.  BioRob-Arm: A Quickly Deployable and Intrinsically Safe, Light- Weight Robot Arm for Service Robotics Applications , 2010, ISR/ROBOTIK.

[226]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[227]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[228]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[229]  Jan Peters,et al.  A biomimetic approach to robot table tennis , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[230]  Jan Peters,et al.  Simulating Human Table Tennis with a Biomimetic Robot Setup , 2010, SAB.

[231]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[232]  Bojan Nemec,et al.  Learning of a ball-in-a-cup playing robot , 2010, 19th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD 2010).

[233]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[234]  Marc Peter Deisenroth,et al.  A Practical and Conceptual Framework for Learning in Control , 2010 .

[235]  Sebastian Thrun,et al.  A probabilistic approach to mixed open-loop and closed-loop control, with application to extreme autonomous driving , 2010, 2010 IEEE International Conference on Robotics and Automation.

[236]  Jun Morimoto,et al.  Task-Specific Generalization of Discrete and Periodic Dynamic Movement Primitives , 2010, IEEE Transactions on Robotics.

[237]  L. Bailey The Kalman Filter , 2010 .

[238]  Eric Rogers,et al.  Iterative learning control applied to a gantry robot and conveyor system , 2010 .

[239]  David Silver,et al.  Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[240]  Peter Stone,et al.  Generalized model learning for Reinforcement Learning on a humanoid robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[241]  Sethu Vijayakumar,et al.  Learning nullspace policies , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[242]  Olivier Sigaud,et al.  From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.

[243]  Jörg Stückler,et al.  Learning Motion Skills from Expert Demonstrations and Own Experience using Gaussian Process Regression , 2010, ISR/ROBOTIK.

[244]  Jan Peters,et al.  Learning table tennis with a Mixture of Motor Primitives , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[245]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[246]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[247]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[248]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[249]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[250]  Stefan Schaal,et al.  Learning motion primitive goals for robust manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[251]  Jan Peters,et al.  Incremental online sparsification for model learning in real-time robot control , 2011, Neurocomputing.

[252]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[253]  Raffaello D'Andrea,et al.  Quadrocopter ball juggling , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[254]  Ales Ude,et al.  Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives , 2011, Robotics Auton. Syst..

[255]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[256]  Oliver Kroemer,et al.  Learning visual representations for perception-action systems , 2011, Int. J. Robotics Res..

[257]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[258]  Klas Kronander,et al.  Learning to control planar hitting motions in a minigolf-like task , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[259]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[260]  Gerd Hirzinger,et al.  Trajectory planning for optimal robot catching in real-time , 2011, 2011 IEEE International Conference on Robotics and Automation.

[261]  Jan Peters,et al.  Learning elementary movements jointly with a higher level task , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[262]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[263]  Jan Peters,et al.  Nonamemanuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to , 2011 .

[264]  J. Andrew Bagnell,et al.  Reinforcement Planning: RL for optimal planners , 2012, 2012 IEEE International Conference on Robotics and Automation.

[265]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[266]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .

[267]  Jean-Paul Chilès,et al.  Wiley Series in Probability and Statistics , 2012 .

[268]  Ales Ude,et al.  Applying statistical generalization to determine search direction for reinforcement learning of movement primitives , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[269]  Jan Peters,et al.  Learning Prioritized Control of Motor Primitives , 2012, ArXiv.

[270]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[271]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[272]  Sanjiban Choudhury Application of Reinforcement Learning in Robot Soccer ! , 2013 .

[273]  Luke Kerr Real time learning , 2015 .

[274]  Marina Bosch,et al.  A Robot Ping Pong Player Experiment In Real Time Intelligent Control , 2016 .

[275]  Bernd Faust,et al.  Model-Based Control of a Robot Manipulator , 1988 .

[276]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .

[277]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .