Regret Minimization in Nonstationary Markov Decision Processes
暂无分享,去创建一个
[1] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .
[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[3] Süleyman Özekici. Markov modulated Bernoulli process , 1997, Math. Methods Oper. Res..
[4] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[5] Shie Mannor,et al. Online learning in Markov decision processes with arbitrarily changing rewards and transitions , 2009, 2009 International Conference on Game Theory for Networks.
[6] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.
[7] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[8] Sandeep Pandey,et al. Handling Advertisements of Unknown Quality in Search Advertising , 2006, NIPS.
[9] C. Fuh. Asymptotic operating characteristics of an optimal change point detection in hidden Markov models , 2004, math/0503682.
[10] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[11] Philip Wolfe,et al. Contributions to the theory of games , 1953 .
[12] Shie Mannor,et al. The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..
[13] Peng Shi,et al. Limiting Average Criteria For Nonstationary Markov Decision Processes , 2000, SIAM J. Optim..
[14] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[15] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[16] Shie Mannor,et al. The Robustness-Performance Tradeoff in Markov Decision Processes , 2006, NIPS.
[17] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[18] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[19] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[20] Shie Mannor,et al. Arbitrarily modulated Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[21] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[22] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2009, Math. Oper. Res..
[23] Naoki Abe,et al. Learning to Optimally Schedule Internet Banner Advertisements , 1999, ICML.
[24] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[25] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[26] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[27] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[28] D. Donoho,et al. Uncertainty principles and signal recovery , 1989 .
[29] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[30] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[31] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.
[32] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).
[33] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[34] J. Filar,et al. Competitive Markov Decision Processes , 1996 .