Collect & Infer - a fresh look at data-efficient Reinforcement Learning
暂无分享,去创建一个
Martin A. Riedmiller | Jost Tobias Springenberg | Nicolas Heess | Martin Riedmiller | Roland Hafner | N. Heess | Roland Hafner | J. T. Springenberg
[1] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[2] S. Levine,et al. Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.
[3] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[4] Sergey Levine,et al. OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning , 2021, ICLR.
[5] N. Heess,et al. Importance Weighted Policy Learning and Adaptation , 2020, 2009.04875.
[6] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.
[7] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[8] Yee Whye Teh,et al. Behavior Priors for Efficient Reinforcement Learning , 2020, J. Mach. Learn. Res..
[9] Kate Saenko,et al. Hierarchical Actor-Critic , 2017, ArXiv.
[10] P. Alam. ‘N’ , 2021, Composites Engineering: An A–Z Guide.
[11] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[12] Martin A. Riedmiller,et al. Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models , 2019, CoRL.
[13] W. Hager,et al. and s , 2019, Shallow Water Hydraulics.
[14] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[15] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[16] P. Alam. ‘T’ , 2021, Composites Engineering: An A–Z Guide.
[17] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[18] Martin Lauer,et al. Learning to dribble on a real robot by success and failure , 2008, 2008 IEEE International Conference on Robotics and Automation.
[19] Alexander J. Smola,et al. Meta-Q-Learning , 2020, ICLR.
[20] R. Sarpong,et al. Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.
[21] Sandy H. Huang,et al. On Multi-objective Policy Optimization as a Tool for Reinforcement Learning , 2021, ArXiv.
[22] Georg Ostrovski,et al. Temporally-Extended ε-Greedy Exploration , 2020, ICLR.
[23] Yee Whye Teh,et al. Neural probabilistic motor primitives for humanoid control , 2018, ICLR.
[24] Sergey Levine,et al. Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills , 2021, ICML.
[25] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[26] 장윤희,et al. Y. , 2003, Industrial and Labor Relations Terms.
[27] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[28] Pascal Poupart,et al. Bayesian Reinforcement Learning , 2010, Encyclopedia of Machine Learning.
[29] Sergio Gomez Colmenarejo,et al. RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning , 2020 .
[30] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[31] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[32] Sergey Levine,et al. MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , 2021, ArXiv.
[33] David Rolnick,et al. Experience Replay for Continual Learning , 2018, NeurIPS.
[34] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[35] Yoshua Bengio,et al. Probabilistic Planning with Sequential Monte Carlo methods , 2018, ICLR.
[36] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[37] Martin A. Riedmiller,et al. Compositional Transfer in Hierarchical Reinforcement Learning , 2019, Robotics: Science and Systems.
[38] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[39] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[40] Jackie Kay,et al. Learning Dexterous Manipulation from Suboptimal Experts , 2020, ArXiv.
[41] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[42] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[43] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[44] Sergey Levine,et al. COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning , 2020, ArXiv.
[45] Martin A. Riedmiller,et al. Reinforcement learning on explicitly specified time scales , 2003, Neural Computing & Applications.
[46] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[47] Yee Whye Teh,et al. Information asymmetry in KL-regularized RL , 2019, ICLR.
[48] Nando de Freitas,et al. Robust Imitation of Diverse Behaviors , 2017, NIPS.
[49] P. Alam. ‘S’ , 2021, Composites Engineering: An A–Z Guide.
[50] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[51] Dushyant Rao,et al. Data-efficient Hindsight Off-policy Option Learning , 2021, ICML.
[52] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[53] Jost Tobias Springenberg,et al. Simple Sensor Intentions for Exploration , 2020, ArXiv.
[54] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.
[55] Lin F. Yang,et al. Toward the Fundamental Limits of Imitation Learning , 2020, NeurIPS.
[56] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[57] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[58] Sergey Levine,et al. Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? , 2019, ArXiv.
[59] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[60] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[61] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[62] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[63] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[64] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[65] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[66] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[67] Martin A. Riedmiller,et al. Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion , 2020, CoRL.
[68] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[69] Pieter Abbeel,et al. Planning to Explore via Self-Supervised World Models , 2020, ICML.
[70] P. Alam,et al. R , 1823, The Herodotus Encyclopedia.
[71] P. Alam. ‘E’ , 2021, Composites Engineering: An A–Z Guide.
[72] Ion Stoica,et al. DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.
[73] Sergey Levine,et al. Learning Latent Plans from Play , 2019, CoRL.
[74] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[75] Sham M. Kakade,et al. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.
[76] Jackie Kay,et al. Local Search for Policy Iteration in Continuous Control , 2020, ArXiv.
[77] P. Alam. ‘W’ , 2021, Composites Engineering.
[78] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[79] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[80] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.