暂无分享,去创建一个
Yishay Mansour | Christoph Dann | Ayush Sekhari | Karthik Sridharan | Mehryar Mohri | Y. Mansour | M. Mohri | Karthik Sridharan | Ayush Sekhari | Christoph Dann
[1] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[2] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[3] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[4] Zhengyuan Zhou,et al. Provably Efficient Reinforcement Learning with Aggregated States , 2019, ArXiv.
[5] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[6] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[7] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[8] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[9] Paul Weng,et al. Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation , 2019, ArXiv.
[10] Noga Alon,et al. From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.
[11] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[12] Jon D. McAuliffe,et al. Time-uniform, nonparametric, nonasymptotic confidence sequences , 2018, The Annals of Statistics.
[13] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[14] Christoph Dann,et al. Strategic Exploration in Reinforcement Learning - New Algorithms and Learning Guarantees , 2020 .
[15] Jon D. McAuliffe,et al. Uniform, nonparametric, non-asymptotic confidence sequences , 2018 .
[16] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[17] Shie Mannor,et al. From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.
[18] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[19] Nan Jiang,et al. Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.
[20] Ilya Kostrikov,et al. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.
[21] Pieter Abbeel,et al. Reinforcement Learning with Augmented Data , 2020, NeurIPS.
[22] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[23] Thodoris Lykouris,et al. Graph regret bounds for Thompson Sampling and UCB , 2019, ALT.
[24] Nan Jiang,et al. On Oracle-Efficient PAC Reinforcement Learning with Rich Observations , 2018 .
[25] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.
[26] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[27] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[28] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[29] Marc Lelarge,et al. Leveraging Side Observations in Stochastic Bandits , 2012, UAI.
[30] Tamir Hazan,et al. Online Learning with Feedback Graphs Without the Graphs , 2016, ICML 2016.
[31] Atilla Eryilmaz,et al. Stochastic bandits with side observations on networks , 2014, SIGMETRICS '14.
[32] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[33] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[34] Craig Boutilier,et al. RecSim: A Configurable Simulation Platform for Recommender Systems , 2019, ArXiv.
[35] Claudio Gentile,et al. Online Learning with Abstention , 2017, ICML.
[36] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[37] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[38] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[39] Claudio Gentile,et al. Online Learning with Sleeping Experts and Feedback Graphs , 2019, ICML.
[40] Michal Valko,et al. Online Learning with Noisy Side Observations , 2016, AISTATS.
[41] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[42] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[43] Christoph Dann,et al. Sample Efficient Policy Search for Optimal Stopping Domains , 2017, IJCAI.
[44] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[45] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[46] Mehryar Mohri,et al. Bandits with Feedback Graphs and Switching Costs , 2019, NeurIPS.