Markov Decision Processes with Arbitrary Reward Processes
暂无分享,去创建一个
[1] J. Renegar. Some perturbation theory for linear programming , 1994, Math. Program..
[2] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[4] Andreu Mas-Colell,et al. A General Class of Adaptive Strategies , 1999, J. Econ. Theory.
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[8] Shie Mannor,et al. Regret minimization in repeated matrix games with variable stage duration , 2008, Games Econ. Behav..
[9] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .
[10] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[11] R. Aumann. Markets with a continuum of traders , 1964 .
[12] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[13] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine Learning.
[14] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[15] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[16] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .
[17] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .
[18] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[19] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[20] Andrew G. Barto,et al. An Actor/Critic Algorithm that is Equivalent to Q-Learning , 1994, NIPS.
[21] Ehud Lehrer,et al. A wide range no-regret theorem , 2003, Games Econ. Behav..
[22] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .
[23] Philip Wolfe,et al. Contributions to the theory of games , 1953 .
[24] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[25] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[26] S. M. Robinson. Bounds for error in the solution set of a perturbed linear program , 1973 .
[27] S. Bobkov,et al. Modified Logarithmic Sobolev Inequalities in Discrete Settings , 2006 .
[28] Ward Whitt,et al. A Nonstationary Offered-Load Model for Packet Networks , 2001, Telecommun. Syst..
[29] Shie Mannor,et al. The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..
[30] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[31] Neri Merhav,et al. On sequential strategies for loss functions with memory , 2002, IEEE Trans. Inf. Theory.
[32] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[33] David M. Kreps,et al. Learning Mixed Equilibria , 1993 .
[34] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .