Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments
暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[2] A. Shwartz,et al. Guaranteed performance regions in Markovian systems with competing decision makers , 1993, IEEE Trans. Autom. Control..
[3] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[4] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[5] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.
[6] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.
[7] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[8] A. Rustichini. Minimizing Regret : The General Case , 1999 .
[9] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .
[10] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .
[11] Shie Mannor,et al. Regret Minimization in Signal Space for Repeated Matrix Games with Partial Observations , 2000 .
[12] D. Bertsekas,et al. Stochastic Shortest Path Games , 1999 .
[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[15] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[16] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .
[17] D. Fudenberg,et al. Consistency and Cautious Fictitious Play , 1995 .
[18] Dimitri P. Bertsekas,et al. Stochastic shortest path games: theory and algorithms , 1997 .