Learning Control in Robotics

Recent trends in robot learning are to use trajectory-based optimal control techniques and reinforcement learning to scale complex robotic systems. On the one hand, increased computational power and multiprocessing, and on the other hand, probabilistic reinforcement learning methods and function approximation, have contributed to a steadily increasing interest in robot learning. Imitation learning has helped significantly to start learning with reasonable initial behavior. However, many applications are still restricted to rather lowdimensional domains and toy applications. Future work will have to demonstrate the continual and autonomous learning abilities, which were alluded to in the introduction.

[1]  M. Ciletti,et al.  The computation and theory of optimal control , 1972 .

[2]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[3]  Y. Bar-Shalom,et al.  Wide-sense adaptive dual control for nonlinear stochastic systems , 1973 .

[4]  Yaakov Bar-Shalom,et al.  Caution, Probing, and the Value of Information in the Control of Uncertain Systems , 1976 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[7]  Christopher G. Atkeson,et al.  Using Local Models to Control Movement , 1989, NIPS.

[8]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[9]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[10]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[11]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[12]  M. Kawato,et al.  Trajectory formation of arm movement by a neural network with forward and inverse dynamics models , 1993 .

[13]  S. Grossberg,et al.  A Self-Organizing Neural Model of Motor Equivalent Reaching and Tool Use by a Multijoint Arm , 1993, Journal of Cognitive Neuroscience.

[14]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[15]  J. Spall,et al.  Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation , 1997, Proceedings of the 1997 American Control Conference (Cat. No.97CH36041).

[16]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[17]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[18]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[19]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[20]  D. Scully,et al.  Observational learning in motor skill acquisition: A look at demonstrations , 1998 .

[21]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[22]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[23]  Shun-ichi Amari,et al.  Natural Gradient Learning for Over- and Under-Complete Bases in ICA , 1999, Neural Computation.

[24]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[25]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.

[26]  Geoffrey E. Hinton,et al.  Using EM for Reinforcement Learning , 2000 .

[27]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[28]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[29]  Bruno Siciliano,et al.  Modelling and Control of Robot Manipulators , 1997, Advanced Textbooks in Control and Signal Processing.

[30]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[31]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[32]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[33]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[34]  Douglas Aberdeen,et al.  Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[35]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[36]  Jonathan Baxter,et al.  Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .

[37]  Stefan Schaal,et al.  Computational approaches to motor learning by imitation. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[38]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[39]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[40]  Huan Liu,et al.  Customer Retention via Data Mining , 2000, Artificial Intelligence Review.

[41]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[42]  Yoshihiko Nakamura,et al.  Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..

[43]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[44]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, SIGGRAPH 2004.

[45]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[46]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[47]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[48]  Deb Roy,et al.  Connecting language to the world , 2005, Artif. Intell..

[49]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[50]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[51]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[52]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[53]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[54]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[55]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[56]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[57]  Sanjiv Singh,et al.  The 2005 DARPA Grand Challenge , 2007 .

[58]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[59]  C. Atkeson Randomly Sampling Actions In Dynamic Programming , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[60]  Stefan Schaal,et al.  The New Robotics—towards Human-centered Machines , 2007 .

[61]  H. Kappen An introduction to stochastic control theory, path integrals and reinforcement learning , 2007 .

[62]  Sanjiv Singh,et al.  The 2005 DARPA Grand Challenge: The Great Robot Race , 2007 .

[63]  Christopher G. Atkeson,et al.  Random Sampling of States in Dynamic Programming , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[64]  Duy Nguyen-Tuong,et al.  Local Gaussian Process Regression for Real Time Online Model Learning , 2008, NIPS.

[65]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[66]  Stefan Schaal,et al.  A Bayesian approach to empirical local linearization for robotics , 2008, 2008 IEEE International Conference on Robotics and Automation.

[67]  Jan Peters,et al.  Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.

[68]  Jun Morimoto,et al.  Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[69]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[70]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[71]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[72]  Sanjiv Singh,et al.  The DARPA Urban Challenge: Autonomous Vehicles in City Traffic, George Air Force Base, Victorville, California, USA , 2009, The DARPA Urban Challenge.

[73]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[74]  Christopher G. Atkeson,et al.  Standing balance control using a trajectory library , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[75]  Stefan Schaal,et al.  Path integral-based stochastic optimal control for rigid body dynamics , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[76]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[77]  Christopher G. Atkeson,et al.  Control of a walking biped using a combination of simple policies , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[78]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[79]  Marc Toussaint,et al.  Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[80]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[81]  Jan Peters,et al.  Practical Learning Algorithms for Motor Primitives in Robotics , 2010 .

[82]  D. Prattichizzo Ieee Robotics & Automation Magazine Robotics in Second Life , .