Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients
暂无分享,去创建一个
Jürgen Schmidhuber | Filipe Wall Mutz | Avinash Ummadisingu | Filipe Mutz | J. Schmidhuber | Paulo Rauber | Paulo Rauber | Avinash Ummadisingu | Paulo E. Rauber
[1] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[2] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[3] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.
[4] Alexander Fabisch,et al. Active contextual policy search , 2014, J. Mach. Learn. Res..
[5] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[6] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[7] 包 竹秀. Theoretical Sources of Schlermacher’s Translation Theory , 2020 .
[8] Jan Peters,et al. Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.
[9] David Hsu,et al. Factored Contextual Policy Search with Bayesian optimization , 2016, 2019 International Conference on Robotics and Automation (ICRA).
[10] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[11] Jürgen Schmidhuber,et al. First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.
[12] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[15] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.
[16] Sergey Levine,et al. Divide-and-Conquer Reinforcement Learning , 2017, ICLR.
[17] Juergen Schmidhuber,et al. Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them to Actions , 2019, ArXiv.
[18] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[19] Honglak Lee,et al. Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.
[20] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.
[21] Jan Peters,et al. Nonamemanuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to , 2011 .
[22] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[24] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[25] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[26] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[27] Pieter Abbeel,et al. On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient , 2010, NIPS.
[28] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[30] Peter Englert,et al. Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[31] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[32] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[33] Tom Schaul,et al. Unicorn: Continual Learning with a Universal, Off-policy Agent , 2018, ArXiv.
[34] Pranab Kumar Sen,et al. Large Sample Methods in Statistics: An Introduction with Applications , 1993 .
[35] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.
[36] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..
[37] Kate Saenko,et al. Hierarchical Reinforcement Learning with Hindsight , 2018, ArXiv.
[38] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[39] J. Schmidhuber,et al. Learning to Generate Focus Trajectories for Attentive Vision , 2019 .
[40] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[41] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[42] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[43] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[44] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[45] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[46] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[47] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[48] J. H. Metzen,et al. Bayesian Optimization for Contextual Policy Search * , 2015 .
[49] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[50] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[51] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.