暂无分享,去创建一个
Tom Schaul | Simon Osindero | Georg Ostrovski | Diana Borsa | Will Dabney | David Szepesvari | David Ding | T. Schaul | Georg Ostrovski | Simon Osindero | Will Dabney | David Szepesvari | Diana Borsa | David Ding
[1] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.
[2] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.
[3] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.
[4] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[5] Sergey Levine,et al. Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.
[6] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[7] Omar Besbes,et al. Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-Stationary Rewards , 2014, Stochastic Systems.
[8] Sheetal Kalyani,et al. Taming Non-stationary Bandits: A Bayesian Approach , 2017, ArXiv.
[9] Martial Hebert,et al. Learning by Asking Questions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Martha White,et al. Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study , 2019, ArXiv.
[11] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[12] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[13] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[14] Marc G. Bellemare,et al. An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents , 2018, IJCAI.
[15] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[16] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[17] Pierre Baldi,et al. Bayesian surprise attracts human attention , 2005, Vision Research.
[18] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[19] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.
[20] David Silver,et al. On Inductive Biases in Deep Reinforcement Learning , 2019, ArXiv.
[21] Daochen Zha,et al. Experience Replay Optimization , 2019, IJCAI.
[22] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[23] Tom Schaul,et al. Universal Successor Features Approximators , 2018, ICLR.
[24] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[25] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[28] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[29] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[30] Yee Whye Teh,et al. Mix&Match - Agent Curricula for Reinforcement Learning , 2018, ICML.
[31] Sergey Levine,et al. Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.
[32] Peter Stone,et al. The Impact of Nondeterminism on Reproducibility in Deep Reinforcement Learning , 2018 .
[33] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[34] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[35] Shaun S. Wang. A CLASS OF DISTORTION OPERATORS FOR PRICING FINANCIAL AND INSURANCE RISKS , 2000 .
[36] Marco Mirolli,et al. Functions and Mechanisms of Intrinsic Motivations , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.
[37] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[38] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[39] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[40] Jürgen Schmidhuber,et al. Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.
[41] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[42] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.