论文信息 - Learning Compositional Neural Programs for Continuous Control

Learning Compositional Neural Programs for Continuous Control

We propose a novel solution to challenging sparse-reward, continuous control problems that require hierarchical planning at multiple levels of abstraction. Our solution, dubbed AlphaNPI-X, involves three separate stages of learning. First, we use off-policy reinforcement learning algorithms with experience replay to learn a set of atomic goal-conditioned policies, which can be easily repurposed for many tasks. Second, we learn self-models describing the effect of the atomic policies on the environment. Third, the self-models are harnessed to learn recursive compositional programs with multiple levels of abstraction. The key insight is that the self-models enable planning by imagination, obviating the need for interaction with the world when learning higher-level compositional programs. To accomplish the third stage of learning, we extend the AlphaNPI algorithm, which applies AlphaZero to learn recursive neural programmer-interpreters. We empirically show that AlphaNPI-X can effectively learn to tackle challenging sparse manipulation tasks, such as stacking multiple blocks, where powerful model-free baselines fail.

[1] Pierre-Yves Oudeyer,et al. CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning , 2018, ICML 2019.

[2] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[3] Pierre Baldi,et al. Curiosity-Driven Multi-Criteria Hindsight Experience Replay , 2019, ArXiv.

[4] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[5] Sergey Levine,et al. Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.

[6] Sheila A. McIlraith,et al. Symbolic Plans as High-Level Instructions for Reinforcement Learning , 2020, ICAPS.

[7] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[8] Yu Zhang,et al. A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[9] OpenAI. Learning Dexterous In-Hand Manipulation. , 2018 .

[10] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[11] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[12] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[13] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[14] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[15] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16] Doina Precup,et al. The Option Keyboard: Combining Skills in Reinforcement Learning , 2021, NeurIPS.

[17] Chelsea Finn,et al. Language as an Abstraction for Hierarchical Deep Reinforcement Learning , 2019, NeurIPS.

[18] Sergey Levine,et al. Efficient Exploration via State Marginal Matching , 2019, ArXiv.

[19] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20] Sergey Levine,et al. Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[21] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[22] David Silver,et al. Compositional Planning Using Optimal Option Models , 2012, ICML.

[23] Lydia Tapia,et al. PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[25] Chelsea Finn,et al. Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation , 2019, ICLR.

[26] Sergey Levine,et al. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[27] Animesh Garg,et al. LEAF: Latent Exploration Along the Frontier , 2020 .

[28] Animesh Garg,et al. Dynamics-Aware Latent Space Reachability for Exploration in Temporally-Extended Tasks , 2020, ArXiv.

[29] Doina Precup,et al. Marginalized State Distribution Entropy Regularization in Policy Optimization , 2019, ArXiv.

[30] Kate Saenko,et al. Hierarchical Actor-Critic , 2017, ArXiv.

[31] Manfred Morari,et al. Model predictive control: Theory and practice - A survey , 1989, Autom..

[32] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[33] Nando de Freitas,et al. Neural Programmer-Interpreters , 2015, ICLR.

[34] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[35] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[36] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[37] Nando de Freitas,et al. Learning Compositional Neural Programs with Recursive Tree Search and Planning , 2019, NeurIPS.

[38] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[39] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[40] Silvio Savarese,et al. Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[41] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[42] Pieter Abbeel,et al. Planning to Explore via Self-Supervised World Models , 2020, ICML.

[43] Allan Jabri,et al. Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[44] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[45] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[46] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[47] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[48] Kate Saenko,et al. Hierarchical Reinforcement Learning with Hindsight , 2018, ArXiv.