Reinforcement Learning in Robust Markov Decision Processes
暂无分享,去创建一个
[1] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[4] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[5] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[6] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[7] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[8] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[9] Ambuj Tewari,et al. Bounded Parameter Markov Decision Processes with Average Reward Criterion , 2007, COLT.
[10] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[11] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[12] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..
[13] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[14] Shie Mannor,et al. Arbitrarily modulated Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[15] Shie Mannor,et al. Distributionally Robust Markov Decision Processes , 2010, Math. Oper. Res..
[16] Shie Mannor,et al. Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty , 2012, ICML.
[17] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .
[18] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[19] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.