暂无分享,去创建一个
[1] Jan Leike. Exploration Potential , 2016, ArXiv.
[2] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[3] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[4] Ronald Ortner,et al. Pseudometrics for State Aggregation in Average Reward Markov Decision Processes , 2007, ALT.
[5] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[6] Robert L. Smith,et al. Aggregation in Dynamic Programming , 1987, Oper. Res..
[7] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[8] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[9] Yishay Mansour,et al. Approximate Equivalence of Markov Decision Processes , 2003, COLT.
[10] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[11] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.
[12] Alexandre Proutière,et al. Exploration in Structured Reinforcement Learning , 2018, NeurIPS.
[13] Doina Precup,et al. Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.
[14] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[15] Benjamin Van Roy,et al. Near-optimal Reinforcement Learning in Factored MDPs , 2014, NIPS.
[16] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[17] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[18] Balaraman Ravindran. Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .
[19] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[20] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[21] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[22] Marcus Hutter,et al. Extreme State Aggregation beyond MDPs , 2014, ALT.
[23] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[24] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[25] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[26] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[27] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[28] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[29] Ronald Ortner,et al. Adaptive aggregation for reinforcement learning in average reward Markov decision processes , 2013, Ann. Oper. Res..
[30] Alessandro Lazaric,et al. Exploration – Exploitation in MDPs with Options , 2016 .
[31] Robert Givan,et al. Bounded-parameter Markov decision processes , 2000, Artif. Intell..
[32] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[33] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[34] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[35] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[36] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[37] J. A. Fill. Eigenvalue bounds on convergence to stationarity for nonreversible markov chains , 1991 .
[38] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[39] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.