Reinforcement learning in robotics: A survey

Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value-function-based and policy-search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.

[1]  F. B. Vernadat,et al.  Decisions with Multiple Objectives: Preferences and Value Tradeoffs , 1994 .

[2]  Mark D. Pendrith Reinforcement Learning in Situated Agents: Theoretical and Practical Solutions , 1999, EWLR.

[3]  Sethu Vijayakumar,et al.  Using dimensionality reduction to exploit constraints in reinforcement learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[5]  Maren Bennewitz,et al.  Learning reliable and efficient navigation with a humanoid , 2010, 2010 IEEE International Conference on Robotics and Automation.

[6]  R.J. Williams,et al.  Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[7]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[8]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[9]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[11]  Shin Ishii,et al.  Reinforcement Learning for Biped Locomotion , 2002, ICANN.

[12]  Darrin C. Bentivegna,et al.  Learning From Observation and Practice Using Behavioral Primitives : Marble Maze , 2004 .

[13]  Oliver G. Selfridge,et al.  Real-time learning: a ball on a beam , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[14]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[15]  Martin A. Riedmiller,et al.  Reinforcement learning on an omnidirectional mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[16]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[17]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[18]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[19]  A. Shwartz,et al.  Handbook of Markov decision processes : methods and applications , 2002 .

[20]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[21]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[22]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[23]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[24]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[25]  Yi Gu,et al.  Space-indexed dynamic programming: learning to follow trajectories , 2008, ICML '08.

[26]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[27]  Pieter Abbeel,et al.  Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations , 2010, 2010 IEEE International Conference on Robotics and Automation.

[28]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[29]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[30]  W. Burgard,et al.  Autonomous blimp control using model-free reinforcement learning in a continuous state and action space , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Andrew G. Barto,et al.  Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[32]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[33]  Ales Ude,et al.  Task adaptation through exploration and action sequencing , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[34]  Minoru Asada,et al.  Purposive behavior acquisition for a real robot by vision-based reinforcement learning , 1995, Machine Learning.

[35]  Shigenobu Kobayashi,et al.  Reinforcement learning of walking behavior for a four-legged robot , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[36]  Marc Toussaint,et al.  Bayesian Time Series Models: Expectation maximisation methods for solving (PO)MDPs and optimal control problems , 2011 .

[37]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[38]  S. Schaal,et al.  A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.

[39]  David Silver,et al.  High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2008, Robotics: Science and Systems.

[40]  M. Goodman Learning to Walk: The Origins of the UK's Joint Intelligence Committee , 2008 .

[41]  Astrophysics Departm Reinforcement Learning of Behaviors in Mobile Robots Using Noisy Infrared Sensing , 2008 .

[42]  Sebastian Thrun,et al.  Apprenticeship learning for motion planning with application to parking lot navigation , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Donald E. Kirk,et al.  Optimal Control Theory , 1970 .

[44]  Minoru Asada,et al.  Cooperative behavior acquisition in multi-mobile robots environment by reinforcement learning based on state vector estimation , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[45]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[46]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[47]  Christopher G. Atkeson,et al.  Control of Instantaneously Coupled Systems applied to humanoid walking , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[48]  PracticalSolutionsMark D. Pendrith Reinforcement Learning in Situated Agents : Some Theoretical Problems and , 1999 .

[49]  Mitsuo Kawato,et al.  Feedback-Error-Learning Neural Network for Supervised Motor Learning , 1990 .

[50]  Scott Kuindersma,et al.  Autonomous Skill Acquisition on a Mobile Manipulator , 2011, AAAI.

[51]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[52]  Atsuo Takanishi,et al.  Development of a biped walking robot having antagonistic driven joints using nonlinear spring mechanism , 1997, Proceedings of International Conference on Robotics and Automation.

[53]  Michel Tokic,et al.  The Crawler, A Class Room Demonstrator for Reinforcement Learning , 2009, FLAIRS.

[54]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .

[55]  Peggy Fidelman,et al.  Learning Ball Acquisition on a Physical Robot , 2004 .

[56]  Ian R. Manchester,et al.  LQR-trees: Feedback Motion Planning via Sums-of-Squares Verification , 2010, Int. J. Robotics Res..

[57]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[58]  Leslie Pack Kaelbling,et al.  A Framework for Reinforcement Learning on Real Robots , 1998, AAAI/IAAI.

[59]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[60]  Dieter Fox,et al.  Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[61]  Ian R. Manchester,et al.  Feedback controller parameterizations for Reinforcement Learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[62]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[63]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[64]  Xiao Huang,et al.  Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[65]  J. Andrew Bagnell,et al.  Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.

[66]  Mikael Norrlöf,et al.  An adaptive iterative learning control algorithm with experiments on an industrial robot , 2002, IEEE Trans. Robotics Autom..

[67]  Stephen Hart,et al.  Learning Generalizable Control Programs , 2011, IEEE Transactions on Autonomous Mental Development.

[68]  Luke Fletcher,et al.  Reinforcement learning for a vision based mobile robot , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[69]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[70]  Warren B. Powell,et al.  AI, OR and Control Theory: A Rosetta Stone for Stochastic Optimization , 2012 .

[71]  Jun Nakanishi,et al.  Operational Space Control: A Theoretical and Empirical Comparison , 2008, Int. J. Robotics Res..

[72]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[73]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[74]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[75]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[76]  Claude F. Touzet,et al.  Neural reinforcement learning for behaviour synthesis , 1997, Robotics Auton. Syst..

[77]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[78]  Gerald Sommer,et al.  Integrating symbolic knowledge in reinforcement learning , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[79]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[80]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[81]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[82]  J. Andrew Bagnell,et al.  Reinforcement Planning: RL for optimal planners , 2012, 2012 IEEE International Conference on Robotics and Automation.

[83]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[84]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[85]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[86]  Richard Bellman,et al.  Introduction to the mathematical theory of control processes , 1967 .

[87]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[88]  Jan Peters,et al.  Learning concurrent motor skills in versatile solution spaces , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[89]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[90]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[91]  Jeff G. Schneider,et al.  Covariant Policy Search , 2003, IJCAI.

[92]  Karsten Berns,et al.  Adaptive periodic movement control for the four legged walking machine BISAM , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[93]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[94]  Yong Duan,et al.  Robot Navigation Based on Fuzzy RL Algorithm , 2008, ISNN.

[95]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[96]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[97]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[98]  Martial Hebert,et al.  Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[99]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[100]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[101]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[102]  Takayuki Kanda,et al.  Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[103]  Jun Zhang,et al.  Motor Learning at Intermediate Reynolds Number: Experiments with Policy Gradient on the Flapping Flight of a Rigid Wing , 2010, From Motor Learning to Interaction Learning in Robots.

[104]  Qiang Liu,et al.  Application of reinforcement learning in robot soccer , 2007, Eng. Appl. Artif. Intell..

[105]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[106]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[107]  H. Sebastian Seung,et al.  Learning to Walk in 20 Minutes , 2005 .

[108]  Pieter Abbeel,et al.  Apprenticeship learning for helicopter control , 2009, CACM.

[109]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[110]  Andrew W. Moore,et al.  Direct Policy Search using Paired Statistical Tests , 2001, ICML.

[111]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[112]  Jürgen Schmidhuber,et al.  A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[113]  Kazuaki Yamada,et al.  Emergent synthesis of motion patterns for locomotion robots , 2001, Artif. Intell. Eng..

[114]  Anil V. Rao,et al.  Practical Methods for Optimal Control Using Nonlinear Programming , 1987 .

[115]  Kemal Leblebicioglu,et al.  Free gait generation with reinforcement learning for a six-legged robot , 2008, Robotics Auton. Syst..

[116]  George A. Bekey,et al.  Rapid Reinforcement Learning for Reactive Control Policy Design for Autonomous Robots , 1996 .

[117]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[118]  Lucas Paletta,et al.  Perception and Developmental Learning of Affordances in Autonomous Robots , 2007, KI.

[119]  R. Bellman Dynamic programming. , 1957, Science.

[120]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[121]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[122]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[123]  S. Schaal,et al.  Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.

[124]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[125]  Stefan Schaal,et al.  Learning motion primitive goals for robust manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[126]  Nicholas Roy,et al.  Trajectory Optimization using Reinforcement Learning for Map Exploration , 2008, Int. J. Robotics Res..

[127]  Martin A. Riedmiller,et al.  Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[128]  H. Liu,et al.  A Heuristic Reinforcement Learning for Robot Approaching Objects , 2006, 2006 IEEE Conference on Robotics, Automation and Mechatronics.

[129]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[130]  Marco Colombetti,et al.  Robot shaping: developing situated agents through learning , 1992 .

[131]  Richard L. Lewis,et al.  Reward Design via Online Gradient Ascent , 2010, NIPS.

[132]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[133]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[134]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[135]  Jean-Arcady Meyer,et al.  Learning reactive and planning rules in a motivationally autonomous animat , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[136]  Stefan Schaal,et al.  Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[137]  Tomohiro Shibata,et al.  Policy Gradient Learning of Cooperative Interaction with a Robot Using User's Biological Signals , 2009, ICONIP.

[138]  Sebastian Thrun,et al.  An approach to learning mobile robot navigation , 1995, Robotics Auton. Syst..

[139]  Prasad Tadepalli,et al.  H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .

[140]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[141]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[142]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[143]  Andrew Y. Ng,et al.  Policy search via the signed derivative , 2009, Robotics: Science and Systems.

[144]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[145]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[146]  Christopher G. Atkeson,et al.  Learning from observation using primitives , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[147]  Stefan Schaal,et al.  Proc. Advances in Neural Information Processing Systems (NIPS '08) , 2008 .

[148]  J. T. O'Hanlan The Fosbury flop. , 1968, Virginia medical monthly.

[149]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[150]  John Rust Using Randomization to Break the Curse of Dimensionality , 1997 .

[151]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[152]  Jun Morimoto,et al.  Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[153]  Marc Toussaint,et al.  Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[154]  Oliver Kroemer,et al.  Learning Visual Representations for Interactive Systems , 2009, ISRR.

[155]  Gordon Cheng,et al.  Learning from Observation and from Practice Using Behavioral Primitives , 2003, ISRR.

[156]  I. Elhanany Reinforcement Learning in Sensor-Guided AIBO Robots , 2007 .

[157]  T. J. Rivlin An Introduction to the Approximation of Functions , 2003 .

[158]  Ales Ude,et al.  Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives , 2011, Robotics Auton. Syst..

[159]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[160]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[161]  Suguru Arimoto,et al.  Bettering operation of Robots by learning , 1984, J. Field Robotics.

[162]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[163]  Christopher G. Atkeson,et al.  Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.

[164]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[165]  Sven Behnke,et al.  Imitative Reinforcement Learning for Soccer Playing Robots , 2006, RoboCup.

[166]  A. Moore,et al.  Learning decisions: robustness, uncertainty, and approximation , 2004 .

[167]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[168]  H. Kappen Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[169]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[170]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[171]  Oliver Kroemer,et al.  Learning visual representations for perception-action systems , 2011, Int. J. Robotics Res..

[172]  John Langford,et al.  Relating reinforcement learning performance to classification performance , 2005, ICML '05.

[173]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[174]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[175]  András Lörincz,et al.  Module Based Reinforcement Learning: An Application to a Real Robot , 1997, EWLR.

[176]  Maarten Steinbuch,et al.  Learning-based identification and iterative learning control of direct-drive robots , 2005, IEEE Transactions on Control Systems Technology.

[177]  Vishal Soni,et al.  Reinforcement learning of hierarchical skills on the sony aibo robot , 2005, AAAI 2005.

[178]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[179]  Oliver Kroemer,et al.  Towards Motor Skill Learning for Robotics , 2007, ISRR.

[180]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[181]  Dieter Fox,et al.  Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[182]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[183]  Dirk P. Kroese,et al.  Cross‐Entropy Method , 2011 .

[184]  B. Pasik-Duncan,et al.  Adaptive Control , 1996, IEEE Control Systems.

[185]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[186]  Richard S. Sutton,et al.  On the role of tracking in stationary environments , 2007, ICML '07.

[187]  Wolfram Burgard,et al.  Learning Relational Navigation Policies , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[188]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[189]  Florentin Wörgötter,et al.  Fast biped walking with a reflexive controller and real-time policy searching , 2005, NIPS.

[190]  Bojan Nemec,et al.  Learning of a ball-in-a-cup playing robot , 2010, 19th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD 2010).

[191]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[192]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[193]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[194]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[195]  G. DeJong,et al.  Theory and Application of Reward Shaping in Reinforcement Learning , 2004 .

[196]  Betty J. Mohler,et al.  Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[197]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[198]  Inman Harvey,et al.  Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.

[199]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[200]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[201]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[202]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[203]  Kazuhiro Ohkura,et al.  A Reinforcement Learning Technique with an Adaptive Action Generator for a Multi-robot System , 2008, SAB.

[204]  Christopher G. Atkeson,et al.  Model-Based Control of a Robot Manipulator , 1988 .

[205]  Marc Peter Deisenroth,et al.  A Practical and Conceptual Framework for Learning in Control , 2010 .

[206]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[207]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[208]  Sebastian Thrun,et al.  A probabilistic approach to mixed open-loop and closed-loop control, with application to extreme autonomous driving , 2010, 2010 IEEE International Conference on Robotics and Automation.

[209]  Robert Platt,et al.  Improving Grasp Skills Using Schema Structured Learning , 2006 .

[210]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[211]  M.T. Rosenstein,et al.  Reinforcement learning with supervision by a stable controller , 2004, Proceedings of the 2004 American Control Conference.

[212]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[213]  Eric Rogers,et al.  Iterative learning control applied to a gantry robot and conveyor system , 2010 .

[214]  Jeff G. Schneider,et al.  Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.

[215]  Peter Stone,et al.  RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control , 2011, 2012 IEEE International Conference on Robotics and Automation.

[216]  David Silver,et al.  Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[217]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[218]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[219]  Oliver Kroemer,et al.  Active learning using mean shift optimization for robot grasping , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[220]  Peter Stone,et al.  Generalized model learning for Reinforcement Learning on a humanoid robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[221]  Oliver Brock,et al.  Learning to Manipulate Articulated Objects in Unstructured Environments Using a Grounded Relational Representation , 2008, Robotics: Science and Systems.

[222]  Tomás Martínez-Marín,et al.  Fast Reinforcement Learning for Vision-guided Mobile Robots , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[223]  Roderic A. Grupen,et al.  A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..

[224]  Scott Kuindersma,et al.  Learning dynamic arm motions for postural recovery , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[225]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[226]  Hsien-I Lin,et al.  Learning collision-free reaching skill from primitives , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[227]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[228]  Richard Alan Peters,et al.  Reinforcement Learning with a Supervisor for a Mobile Robot in a Real-world Environment , 2007, 2007 International Symposium on Computational Intelligence in Robotics and Automation.

[229]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[230]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[231]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[232]  Jürgen Schmidhuber,et al.  Quasi-online reinforcement learning for robots , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[233]  Olivier Sigaud,et al.  From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.

[234]  Leemon C Baird,et al.  Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .

[235]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[236]  Jörg Stückler,et al.  Learning Motion Skills from Expert Demonstrations and Own Experience using Gaussian Process Regression , 2010, ISR/ROBOTIK.