Regularized Hierarchical Policies for Compositional Transfer in Robotics

The successful application of flexible, general learning algorithms -- such as deep reinforcement learning -- to real-world robotics applications is often limited by their poor data-efficiency. Domains with more than a single dominant task of interest encourage algorithms that share partial solutions across tasks to limit the required experiment time. We develop and investigate simple hierarchical inductive biases -- in the form of structured policies -- as a mechanism for knowledge transfer across tasks in reinforcement learning (RL). To leverage the power of these structured policies we design an RL algorithm that enables stable and fast learning. We demonstrate the success of our method both in simulated robot environments (using locomotion and manipulation domains) as well as real robot experiments, demonstrating substantially better data-efficiency than competitive baselines.

[1]  John R. Anderson,et al.  The Transfer of Cognitive Skill , 1989 .

[2]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[3]  S. Srihari Mixture Density Networks , 1994 .

[4]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[5]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[7]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[8]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[9]  Thomas G. Dietterich,et al.  To transfer or not to transfer , 2005, NIPS 2005.

[10]  Emilio Soria Olivas,et al.  Handbook of Research on Machine Learning Applications and Trends : Algorithms , Methods , and Techniques , 2009 .

[11]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[13]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[16]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[20]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[21]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[22]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[23]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[24]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[25]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[26]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[27]  Pieter Abbeel,et al.  Mutual Alignment Transfer Learning , 2017, CoRL.

[28]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[29]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[30]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[31]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[32]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[33]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[34]  Yee Whye Teh,et al.  Transferring Task Goals via Hierarchical Reinforcement Learning , 2018 .

[35]  Yuval Tassa,et al.  Relative Entropy Regularized Policy Iteration , 2018, ArXiv.

[36]  Shimon Whiteson,et al.  TACO: Learning Task Decomposition via Temporal Alignment for Control , 2018, ICML.

[37]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[38]  Joelle Pineau,et al.  An Inference-Based Policy Gradient Method for Learning Options , 2018, ICML.

[39]  Doina Precup,et al.  When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[40]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[41]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[42]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[43]  Sergey Levine,et al.  Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[44]  Doina Precup,et al.  The Termination Critic , 2019, AISTATS.

[45]  Yee Whye Teh,et al.  Information asymmetry in KL-regularized RL , 2019, ICLR.

[46]  Jaime G. Carbonell,et al.  Characterizing and Avoiding Negative Transfer , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Shimon Whiteson,et al.  DAC: The Double Actor-Critic Architecture for Learning Options , 2019, NeurIPS.

[48]  Yee Whye Teh,et al.  Exploiting Hierarchy for Learning and Transfer in KL-regularized RL , 2019, ArXiv.

[49]  Sergio Gomez Colmenarejo,et al.  TF-Replicator: Distributed Machine Learning for Researchers , 2019, ArXiv.

[50]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[51]  Shimon Whiteson,et al.  Multitask Soft Option Learning , 2019, UAI.

[52]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..