暂无分享,去创建一个
Junhyuk Oh | David Silver | Matteo Hessel | Tom Zahavy | Zhongwen Xu | Hado van Hasselt | Vivek Veeriah | Iurii Kemaev | Satinder Singh | Junhyuk Oh | Satinder Singh | Matteo Hessel | H. V. Hasselt | Zhongwen Xu | Vivek Veeriah | Tom Zahavy | David Silver | Iurii Kemaev
[1] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[2] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.
[3] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[4] Sergey Levine,et al. Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? , 2019, ArXiv.
[5] Marlos C. Machado,et al. Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.
[6] David Silver,et al. Compositional Planning Using Optimal Option Models , 2012, ICML.
[7] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.
[8] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[9] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[10] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[11] Alec Solway,et al. Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..
[12] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[13] Ali Farhadi,et al. RoboTHOR: An Open Simulation-to-Real Embodied AI Platform , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Honglak Lee,et al. How Should an Agent Practice? , 2019, AAAI.
[15] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[16] David Warde-Farley,et al. Relative Variational Intrinsic Control , 2020, AAAI.
[17] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[18] Peter Stone,et al. The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.
[19] Ali Farhadi,et al. AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.
[20] Shie Mannor,et al. Approximate Value Iteration with Temporally Extended Actions , 2015, J. Artif. Intell. Res..
[21] Jan Peters,et al. Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.
[22] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[23] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[24] Junhyuk Oh,et al. Self-Tuning Deep Reinforcement Learning , 2020, ArXiv.
[25] Philip S. Thomas,et al. Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock , 2019, ArXiv.
[26] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[27] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[28] Doina Precup,et al. When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.
[29] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[30] Andreas Griewank,et al. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.
[31] Takashi Maeno,et al. Hierarchical control method for manipulating/grasping tasks using multi-fingered robot hand , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).
[32] Sridhar Mahadevan,et al. Hierarchical Policy Gradient Algorithms , 2003, ICML.
[33] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.
[34] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.
[35] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.
[36] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.
[37] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[38] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.
[39] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.
[40] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[41] Philip S. Thomas,et al. Policy Gradient Coagent Networks , 2011, NIPS.
[42] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[43] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.
[44] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[45] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.
[46] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[47] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[48] Pravesh Ranchod,et al. Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[49] Marlos C. Machado,et al. Learning Purposeful Behaviour in the Absence of Rewards , 2016, ArXiv.
[50] Doina Precup,et al. The Termination Critic , 2019, AISTATS.
[51] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.