Near-optimal Regret Bounds for Stochastic Shortest Path
暂无分享,去创建一个
Haim Kaplan | Yishay Mansour | Alon Cohen | Aviv Rosenberg | Aviv A. Rosenberg | Y. Mansour | Haim Kaplan | Alon Cohen
[1] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[2] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[3] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[4] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[5] Dimitri P. Bertsekas,et al. Stochastic Shortest Path Problems Under Weak Conditions , 2013 .
[6] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[7] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[8] Yishay Mansour,et al. Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function , 2019, NeurIPS.
[9] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[10] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[11] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[12] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[13] Alessandro Lazaric,et al. No-Regret Exploration in Goal-Oriented Reinforcement Learning , 2019, ICML.
[14] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[15] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[16] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.