Exploration Driven by an Optimistic Bellman Equation
暂无分享,去创建一个
Jan Peters | Marcello Restelli | Carlo D'Eramo | Joni Pajarinen | Samuele Tosatto | Jan Peters | Marcello Restelli | J. Pajarinen | Samuele Tosatto | Carlo D'Eramo
[1] Jürgen Schmidhuber,et al. Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.
[2] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[3] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[4] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[5] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[6] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Jonathan P. How,et al. Sample Efficient Reinforcement Learning with Gaussian Processes , 2014, ICML.
[9] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[10] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[11] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[12] András Lörincz,et al. The many faces of optimism: a unifying approach , 2008, ICML '08.
[13] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[14] Yishay Mansour,et al. Convergence of Optimistic and Incremental Q-Learning , 2001, NIPS.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[17] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[18] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[19] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[20] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[21] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[22] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[23] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[24] Paul Bourgine,et al. Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.
[25] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[26] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[27] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[28] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .
[29] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[30] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[31] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[32] D. Opitz,et al. Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..
[33] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[34] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[35] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[36] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[37] Martha White,et al. Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains , 2010, NIPS.
[38] Pascal Poupart,et al. Bayesian Reinforcement Learning , 2010, Encyclopedia of Machine Learning.
[39] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .