Autonomous reinforcement learning with hierarchical REPS

Future intelligent robots will need to interact with uncertain and changing environments. One key aspect to allow robotic agents to adapt to such situations is to enable them to learn multiple solution strategies to one problem, such that the agent can remain flexible and employ alternative solutions even if the preferred solution is no longer viable. We propose a unifying framework that allows the use of hierarchical policies and which can, thus, learn multiple solutions at once. We build our method on the basis of relative entropy policy search, an information theoretic policy search approach to reinforcement learning, and evaluate our method on a real robot system.

[1]  Jun Morimoto,et al.  Learning Sensory Feedback to CPG with Policy Gradient for Biped Locomotion , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[2]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[3]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[4]  Andrew G. Barto,et al.  Robot Weightlifting By Direct Policy Search , 2001, IJCAI.

[5]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[6]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[7]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[8]  Sridhar Mahadevan,et al.  Hierarchical Policy Gradient Algorithms , 2003, ICML.

[9]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[10]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[12]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[13]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[14]  Christoph H. Lampert,et al.  Movement templates for learning of hitting and batting , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15]  Marc Toussaint,et al.  Model-free reinforcement learning as mixture learning , 2009, ICML '09.

[16]  Heni Ben Amor,et al.  Kinesthetic Bootstrapping: Teaching Motor Skills to Humanoid Robots through Physical Interaction , 2009, KI.

[17]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[18]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[19]  Gerhard Neumann,et al.  Variational Inference for Policy Search in changing situations , 2011, ICML.

[20]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[21]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[22]  Jan Peters,et al.  Policy Search for Motor Primitives , 2009, Künstliche Intell..

[23]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[24]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[25]  K. Dautenhahn,et al.  The correspondence problem , 2002 .

[26]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[27]  Jeff G. Schneider,et al.  Covariant Policy Search , 2003, IJCAI.

[28]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.