暂无分享,去创建一个
[1] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .
[2] J. Neumann,et al. Theory of Games and Economic Behavior. , 1945 .
[3] Claude E. Shannon,et al. Programming a computer for playing chess , 1950 .
[4] J. Robinson. AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.
[5] J. Nash. NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.
[6] L. S. Shapley,et al. 10. A SIMPLE THREE-PERSON POKER GAME , 1951 .
[7] O. H. Brownlee,et al. ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .
[8] H. W. Kuhn,et al. 11. Extensive Games and the Problem of Information , 1953 .
[9] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[10] Samuel Karlin,et al. Mathematical Methods and Theory in Games, Programming, and Economics , 1961 .
[11] R. Bellman. Dynamic programming. , 1957, Science.
[12] J. Harsanyi. Games with randomly disturbed payoffs: A new rationale for mixed-strategy equilibrium points , 1973 .
[13] R. Selten. Reexamination of the perfectness concept for equilibrium points in extensive games , 1975, Classics in Game Theory.
[14] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[15] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[16] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.
[17] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[18] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[19] Roger B. Myerson,et al. Game theory - Analysis of Conflict , 1991 .
[20] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[21] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[22] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[23] Bernhard von Stengel,et al. Fast algorithms for finding randomized strategies in game trees , 1994, STOC '94.
[24] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[25] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[26] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[27] D. Fudenberg,et al. Consistency and Cautious Fictitious Play , 1995 .
[28] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[29] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[30] D. Koller,et al. Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .
[31] L. Shapley,et al. Fictitious Play Property for Games with Identical Interests , 1996 .
[32] B. Stengel,et al. Efficient Computation of Behavior Strategies , 1996 .
[33] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[34] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[35] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[36] H. J. Jacobsen,et al. Fictitious Play in Extensive Form Games , 1996 .
[37] Ian Frank,et al. Search in Games with Incomplete Information: A Case Study Using Bridge Card Play , 1998, Artif. Intell..
[38] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[39] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .
[40] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[41] B. Jones. BOUNDED RATIONALITY , 1999 .
[42] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[43] David Sklansky,et al. The Theory of Poker , 1999 .
[44] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[45] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[46] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[47] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.
[48] Jonathan Schaeffer,et al. The challenge of poker , 2002, Artif. Intell..
[49] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..
[50] William H. Sandholm,et al. ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .
[51] A. Roth. The Economist as Engineer: Game Theory, Experimentation, and Computation as Tools for Design Economics , 2002 .
[52] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[53] Jonathan Schaeffer,et al. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.
[54] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[55] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[56] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.
[57] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[58] Josef Hofbauer,et al. Stochastic Approximations and Differential Inclusions , 2005, SIAM J. Control. Optim..
[59] Robert L. Smith,et al. A Fictitious Play Approach to Large-Scale Optimization , 2005, Oper. Res..
[60] Marcus Hutter. Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.
[61] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.
[62] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[63] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[64] Jeff S. Shamma,et al. Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria , 2005, IEEE Transactions on Automatic Control.
[65] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[66] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[67] Tuomas Sandholm,et al. A Competitive Texas Hold'em Poker Player via Automated Abstraction and Real-Time Equilibrium Computation , 2006, AAAI.
[68] Michael Kearns,et al. Reinforcement learning for optimized trade execution , 2006, ICML.
[69] David S. Leslie,et al. Generalised weakened fictitious play , 2006, Games Econ. Behav..
[70] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[71] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[72] Javier Peña,et al. Gradient-Based Algorithms for Finding Nash Equilibria in Extensive Form Games , 2007, WINE.
[73] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[74] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[75] Geoffrey J. Gordon,et al. A Fast Bundle-based Anytime Algorithm for Poker and other Convex Games , 2007, AISTATS.
[76] S. Legg. Machine super intelligence , 2008 .
[77] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[78] Bret Hoehn,et al. Effective short-term opponent exploitation in simplified poker , 2005, Machine Learning.
[79] S. Nakamoto,et al. Bitcoin: A Peer-to-Peer Electronic Cash System , 2008 .
[80] Ana L. C. Bazzan,et al. Opportunities for multiagent systems and multiagent reinforcement learning in traffic control , 2009, Autonomous Agents and Multi-Agent Systems.
[81] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008 .
[82] Raymond J. Dolan,et al. Game Theory of Mind , 2008, PLoS Comput. Biol..
[83] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.
[84] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[85] Kevin Waugh,et al. A Practical Use of Imperfect Recall , 2009, SARA.
[86] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.
[87] Kevin Waugh,et al. Abstraction pathologies in extensive games , 2009, AAMAS.
[88] Tuomas Sandholm,et al. Computing Equilibria in Multiplayer Stochastic Games of Imperfect Information , 2009, IJCAI.
[89] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.
[90] Duane Szafron,et al. Using counterfactual regret minimization to create competitive multiplayer poker agents , 2010, AAMAS 2010.
[91] Scott Kuindersma,et al. Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.
[92] Javier Peña,et al. Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..
[93] Tuomas Sandholm,et al. The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..
[94] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[95] Marc Lanctot,et al. Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..
[96] Ian D. Watson,et al. Computer poker: A review , 2011, Artif. Intell..
[97] Kevin Waugh,et al. Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.
[98] Doina Precup,et al. Automatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics , 2011, EWRL.
[99] David Auger,et al. Multiple Tree for Partially Observable Monte-Carlo Tree Search , 2011, EvoApplications.
[100] Milind Tambe,et al. Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .
[101] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[102] Tuomas Sandholm,et al. Lossy stochastic game abstraction with bounds , 2012, EC '12.
[103] Peter I. Cowling,et al. Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[104] Michèle Sebag,et al. The grand challenge of computer Go , 2012, Commun. ACM.
[105] Yee Whye Teh,et al. Actor-Critic Reinforcement Learning with Energy-Based Policies , 2012, EWRL.
[106] Michael H. Bowling,et al. Finding Optimal Abstract Strategies in Extensive-Form Games , 2012, AAAI.
[107] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[108] Nathan R. Sturtevant,et al. A parameterized family of equilibrium profiles for three-player kuhn poker , 2013, AAMAS.
[109] Branislav Bosanský,et al. Convergence of Monte Carlo Tree Search in Simultaneous Move Games , 2013, NIPS.
[110] M. Littman,et al. Solving for Best Responses in Extensive-Form Games using Reinforcement Learning Methods , 2013 .
[111] Michael H. Bowling,et al. Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .
[112] Michael H. Bowling,et al. Evaluating state-space abstractions in extensive-form games , 2013, AAMAS.
[113] Daniel Urieli,et al. TacTex'13: A Champion Adaptive Power Trading Agent , 2014, AAAI.
[114] Tuomas Sandholm,et al. Extensive-form game abstraction with bounds , 2014, EC.
[115] J Heinrich,et al. Self-play Monte-Carlo tree search in computer poker , 2014, AAAI 2014.
[116] Michael H. Bowling,et al. Solving Imperfect Information Games Using Decomposition , 2013, AAAI.
[117] Branislav Bosanský,et al. An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information , 2014, J. Artif. Intell. Res..
[118] V. Lisý. ALTERNATIVE SELECTION FUNCTIONS FOR INFORMATION SET MONTE CARLO TREE SEARCH , 2014 .
[119] Ashwin Lall,et al. Exponential Reservoir Sampling for Streaming Language Models , 2014, ACL.
[120] Michael H. Bowling,et al. Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games , 2015, AAMAS.
[121] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.
[122] Branislav Bosanský,et al. Optimal Network Security Hardening Using Attack Graph Games , 2015, IJCAI.
[123] David Silver,et al. Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.
[124] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[125] Tuomas Sandholm,et al. Simultaneous Abstraction and Equilibrium Finding in Games , 2015, IJCAI.
[126] Kevin Waugh,et al. Solving Games with Functional Regret Estimation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.
[127] Tuomas Sandholm,et al. Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent , 2015, AAAI Workshop: Computer Poker and Imperfect Information.
[128] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.
[129] David Silver,et al. Smooth UCT Search in Computer Poker , 2015, IJCAI.
[130] Peter I. Cowling,et al. Emergent bluffing and inference with Monte Carlo Tree Search , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).
[131] Tuomas Sandholm,et al. Endgame Solving in Large Imperfect-Information Games , 2015, AAAI Workshop: Computer Poker and Imperfect Information.
[132] Amos J. Storkey,et al. Training Deep Convolutional Neural Networks to Play Go , 2015, ICML.
[133] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[134] Peter Dayan,et al. Monte Carlo Planning Method Estimates Planning Horizons during Interactive Social Exchange , 2015, PLoS Comput. Biol..
[135] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[136] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[137] Colin Raffel,et al. Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks , 2015, AAAI.
[138] Kevin Waugh,et al. DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.
[139] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[140] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .