Arbitrarily modulated Markov decision processes
暂无分享,去创建一个
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[3] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[4] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[5] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[6] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[7] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[8] Shie Mannor,et al. Online learning in Markov decision processes with arbitrarily changing rewards and transitions , 2009, 2009 International Conference on Game Theory for Networks.
[9] Shie Mannor,et al. The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..
[10] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.
[11] Shie Mannor,et al. The Robustness-Performance Tradeoff in Markov Decision Processes , 2006, NIPS.
[12] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[13] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[14] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[15] Philip Wolfe,et al. Contributions to the theory of games , 1953 .
[16] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .
[17] Süleyman Özekici. Markov modulated Bernoulli process , 1997, Math. Methods Oper. Res..
[18] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[19] C. Fuh. Asymptotic operating characteristics of an optimal change point detection in hidden Markov models , 2004, math/0503682.
[20] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2009, Math. Oper. Res..
[21] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[22] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[23] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[24] D. Donoho,et al. Uncertainty principles and signal recovery , 1989 .
[25] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).