Efficient learning of multi-step best response
暂无分享,去创建一个
[1] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[2] W. Hamilton,et al. The evolution of cooperation. , 1984, Science.
[3] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[4] Bikramjit Banerjee,et al. Performance Bounded Reinforcement Learning in Strategic Interactions , 2004, AAAI.
[5] David Carmel,et al. Learning Models of Intelligent Agents , 1996, AAAI/IAAI, Vol. 1.
[6] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.
[7] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.
[8] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[9] David Carmel,et al. How to explore your opponent's strategy (almost) optimally , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).
[10] David Carmel,et al. Model-based learning of interaction strategies in multi-agent systems , 1998, J. Exp. Theor. Artif. Intell..
[11] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[12] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[13] Ronitt Rubinfeld,et al. Efficient algorithms for learning to play repeated games against computationally bounded adversaries , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[14] Lance Fortnow,et al. Optimality and domination in repeated games with bounded players , 1993, STOC '94.
[15] Yoav Shoham,et al. Polynomial-time reinforcement learning of near-optimal policies , 2002, AAAI/IAAI.
[16] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..
[17] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[18] Claude-Nicolas Fiechter. Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.
[19] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.
[20] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[21] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.