暂无分享,去创建一个
[1] Marcello Restelli,et al. Estimating the Maximum Expected Value in Continuous Reinforcement Learning Problems , 2017, AAAI.
[2] Jürgen Schmidhuber,et al. Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts , 2006, Connect. Sci..
[3] Marcello Restelli,et al. Exploiting Action-Value Uncertainty to Drive Exploration in Reinforcement Learning , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).
[4] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[5] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[6] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[7] B. Efron. The jackknife, the bootstrap, and other resampling plans , 1987 .
[8] Peter Stone,et al. TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.
[9] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[10] Ian Osband,et al. Risk versus Uncertainty in Deep Learning: Bayes, Bootstrap and the Dangers of Dropout , 2016 .
[11] E. Deci,et al. Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.
[12] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[13] Deepak Pathak,et al. Self-Supervised Exploration via Disagreement , 2019, ICML.
[14] Andrea Bonarini,et al. Exploiting structure and uncertainty of Bellman updates in Markov decision processes , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).
[15] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[16] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[18] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[19] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[20] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[21] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[22] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[23] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[24] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[25] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[26] Chong Li,et al. Model-Free Reinforcement Learning , 2019, Reinforcement Learning for Cyber-Physical Systems.
[27] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[28] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[29] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[30] B. Efron,et al. The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .
[31] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[32] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[33] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[34] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[35] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[36] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[37] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[38] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[39] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[40] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[41] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[42] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[43] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[44] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[45] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[46] Xiaoyu Chen,et al. Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP , 2019, ICLR.
[47] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[48] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[49] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[50] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[51] David Tse,et al. Time-Sensitive Bandit Learning and Satisficing Thompson Sampling , 2017, ArXiv.
[52] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[53] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[54] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[55] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[56] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[57] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[58] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .