Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
暂无分享,去创建一个
Peter L. Bartlett | Yasin Abbasi-Yadkori | Csaba Szepesvari | P. Bartlett | Csaba Szepesvari | Yasin Abbasi-Yadkori
[1] Csaba Szepesvari,et al. Online learning for linearly parametrized control problems , 2012 .
[2] Berthold Vöcking,et al. Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm , 2010, Electron. Colloquium Comput. Complex..
[3] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[4] Thomas Steinke,et al. Learning hurdles for sleeping experts , 2012, ITCS '12.
[5] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[6] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[7] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[8] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[9] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[10] Shie Mannor,et al. Online learning in Markov decision processes with arbitrarily changing rewards and transitions , 2009, 2009 International Conference on Game Theory for Networks.
[11] Oded Regev,et al. On lattices, learning with errors, random linear codes, and cryptography , 2005, STOC '05.
[12] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.
[13] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[14] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[15] Shie Mannor,et al. Arbitrarily modulated Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[16] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.
[17] Adam Tauman Kalai,et al. On agnostic boosting and parity learning , 2008, STOC.
[18] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[19] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[20] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[21] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[22] Nick Littlestone,et al. From on-line to batch learning , 1989, COLT '89.
[23] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[24] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.