Reinforcement Learning for Average Reward Zero-Sum Games
暂无分享,去创建一个
[1] D. Bertsekas,et al. Stochastic Shortest Path Games , 1999 .
[2] Shie Mannor,et al. The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..
[3] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[4] Yishay Mansour,et al. Convergence of Optimistic and Incremental Q-Learning , 2001, NIPS.
[5] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[6] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[7] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[8] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[11] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[12] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[13] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[14] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.
[15] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[16] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[17] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..
[18] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[19] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[20] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[21] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[22] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[23] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.
[24] V. Borkar,et al. An analog scheme for fixed point computation. I. Theory , 1997 .
[25] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[26] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.
[27] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[28] V. Borkar. Stochastic approximation with two time scales , 1997 .
[29] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.