论文信息 - MO2: Model-Based Offline Options

MO2: Model-Based Offline Options

The ability to discover useful behaviours from past experience and transfer them to new tasks is considered a core component of natural embodied intelligence. Inspired by neuroscience, discovering behaviours that switch at bottleneck states have been long sought after for inducing plans of minimum description length across tasks. Prior approaches have either only supported online, on-policy, bottleneck state discovery, limiting sample-efﬁciency, or discrete state-action domains, restricting applicability. To address this, we introduce Model-Based Ofﬂine Options (MO2), an ofﬂine hindsight framework supporting sample-efﬁcient bottleneck option discovery over continuous state-action spaces. Once bottleneck options are learnt ofﬂine over source domains, they are transferred online to improve exploration and value estimation on the transfer domain. Our experiments show that on complex long-horizon continuous control tasks with sparse, delayed rewards, MO2’s properties are essential and lead to performance exceeding recent option learning methods. Additional ablations further demonstrate the impact on option predictability and credit assignment.

[1] Ingmar Posner,et al. Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning , 2022 .

[2] Doina Precup,et al. Flexible Option Learning , 2021, NeurIPS.

[3] Sergey Levine,et al. Parrot: Data-Driven Behavioral Priors for Reinforcement Learning , 2020, ICLR.

[4] S. Levine,et al. OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning , 2020, ICLR.

[5] Martin A. Riedmiller,et al. Data-efficient Hindsight Off-policy Option Learning , 2020, ICML.

[6] Doina Precup,et al. Towards Continual Reinforcement Learning: A Review and Perspectives , 2020, ArXiv.

[7] Doina Precup,et al. Diversity-Enriched Option-Critic , 2020, ArXiv.

[8] Joseph J. Lim,et al. Accelerating Reinforcement Learning with Learned Skill Priors , 2020, CoRL.

[9] Marlos C. Machado,et al. Exploration in Reinforcement Learning with Deep Covering Options , 2020, ICLR.

[10] Justin Fu,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[11] Oleg O. Sushkov,et al. Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.

[12] Martin A. Riedmiller,et al. Compositional Transfer in Hierarchical Reinforcement Learning , 2019, Robotics: Science and Systems.

[13] Qiang Xu,et al. nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Martin A. Riedmiller,et al. Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics , 2020, CoRL.

[15] N. Heess,et al. Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks , 2019 .

[16] S. Levine,et al. RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.

[17] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.

[18] Sebastian Scherer,et al. Learning Temporal Abstraction with Information-theoretic Constraints for Hierarchical Reinforcement Learning , 2019 .

[19] Balaraman Ravindran,et al. Successor Options: An Option Discovery Framework for Reinforcement Learning , 2019, IJCAI.

[20] Shimon Whiteson,et al. DAC: The Double Actor-Critic Architecture for Learning Options , 2019, NeurIPS.

[21] Doina Precup,et al. The Termination Critic , 2019, AISTATS.

[22] Sergey Levine,et al. InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.

[23] Yee Whye Teh,et al. Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[24] Stefan Wermter,et al. Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[25] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.