Learning Replanning Policies With Direct Policy Search

Direct policy search has been successful in learning challenging real-world robotic motor skills by learning open-loop movement primitives with high sample efficiency. These primitives can be generalized to different contexts with varying initial configurations and goals. Current state-of-the-art contextual policy search algorithms can however not adapt to changing, noisy context measurements. Yet, these are common characteristics of real-world robotic tasks. Planning a trajectory ahead based on an inaccurate context that may change during the motion often results in poor accuracy, especially with highly dynamical tasks. To adapt to updated contexts, it is sensible to learn trajectory replanning strategies. We propose a framework to learn trajectory replanning policies via contextual policy search and demonstrate that they are safe for the robot, can be learned efficiently, and outperform non-replanning policies for problems with partially observable or perturbed context.

[1]  Jan Peters,et al.  Empowered skills , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[3]  Aude Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[4]  Betty J. Mohler,et al.  Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  S. Schaal Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[6]  Hany Abdulsamad,et al.  Model-Free Trajectory Optimization for Reinforcement Learning , 2016, ICML.

[7]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[8]  Stefan Schaal,et al.  Biologically-inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance , 2009, 2009 IEEE International Conference on Robotics and Automation.

[9]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[10]  Jan Peters,et al.  Layered direct policy search for learning hierarchical skills , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Jun Morimoto,et al.  Task-Specific Generalization of Discrete and Periodic Dynamic Movement Primitives , 2010, IEEE Transactions on Robotics.

[12]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Guy Shani,et al.  A survey of point-based POMDP solvers , 2013, Autonomous Agents and Multi-Agent Systems.

[15]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[16]  Luís Paulo Reis,et al.  Model-Based Relative Entropy Stochastic Search , 2016, NIPS.

[17]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[18]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[19]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[20]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[21]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[22]  Jan Peters,et al.  Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.

[23]  Stefan Schaal,et al.  Online movement adaptation based on previous sensor experiences , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  VelosoManuela,et al.  A survey of robot learning from demonstration , 2009 .

[25]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[26]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[27]  Yasuharu Koike,et al.  PII: S0893-6080(96)00043-3 , 1997 .

[28]  Bernhard Schölkopf,et al.  Anticipatory action selection for human-robot table tennis , 2017, Artif. Intell..

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.