论文信息 - α-Rank: Multi-Agent Evaluation by Evolution

α-Rank: Multi-Agent Evaluation by Evolution

We introduce α-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). α-Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of the correspondence we establish to the dynamical MCC solution concept when the underlying evolutionary model’s ranking-intensity parameter, α, is chosen to be large, which exactly forms the basis of α-Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley’s Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our α-Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that not only provide an overarching and unifying perspective of existing continuous- and discrete-time evolutionary evaluation models, but also reveal the formal underpinnings of the α-Rank methodology. We illustrate the method in canonical games and empirically validate it in several domains, including AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.

[1] I. Bendixson. Sur les courbes définies par des équations différentielles , 1901 .

[2] S. Kakutani. A generalization of Brouwer’s fixed point theorem , 1941 .

[3] C. Conley. Isolated Invariant Sets and the Morse Index , 1978 .

[4] P. Taylor,et al. Evolutionarily Stable Strategies and Game Dynamics , 1978 .

[5] E. C. Zeeman,et al. Population dynamics from game theory , 1980 .

[6] E. Zeeman. Dynamics of the evolution of animal conflicts , 1981 .

[7] R. C. Evans,et al. A Bayesian Analysis of Free Rider Metagames , 1982 .

[8] John C. Harsanyi,et al. Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .

[9] J. Weibull,et al. Strategy subsets closed under rational behavior , 1991 .

[10] H. Young,et al. The Evolution of Conventions , 1993 .

[11] Jörgen W. Weibull,et al. Evolutionary Game Theory , 1996 .

[12] L. Shapley,et al. Potential Games , 1994 .

[13] J. Hofbauer,et al. Fictitious Play, Shapley Polygons and the Replicator Equation , 1995 .

[14] Douglas E. Norton. The fundamental theorem of dynamical systems , 1995 .

[15] B. Stengel,et al. COMPUTING EQUILIBRIA FOR TWO-PERSON GAMES , 1996 .

[16] J. Hofbauer. Evolutionary dynamics for bimatrix games: A Hamiltonian system? , 1996, Journal of mathematical biology.

[17] L. Shapley,et al. REGULAR ARTICLEPotential Games , 1996 .

[18] Josef Hofbauer,et al. Evolutionary Games and Population Dynamics , 1998 .

[19] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[20] Herbert Gintis,et al. Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction - Second Edition , 2009 .

[21] Eizo Akiyama,et al. Chaos in learning a simple two-person game , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22] G. Tesauro,et al. Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[23] Bernhard von Stengel,et al. Chapter 45 Computing equilibria for two-person games , 2002 .

[24] R. Cressman. Evolutionary Dynamics and Extensive Form Games , 2003 .

[25] Rajarshi Das,et al. Choosing Samples to Compute Heuristic-Strategy Nash Equilibrium , 2003, AMEC.

[26] S. Hart,et al. Uncoupled Dynamics Do Not Lead to Nash Equilibrium , 2003 .

[27] M. Nowak,et al. Evolutionary Dynamics of Biological Games , 2004, Science.

[28] Immanuel M. Bomze,et al. Lotka-Volterra equation and replicator dynamics: new issues in classification , 1995, Biological Cybernetics.

[29] Kousha Etessami,et al. The computational complexity of evolutionarily stable strategies , 2008, Int. J. Game Theory.

[30] I. Bomze. Lotka-Volterra equation and replicator dynamics: A two-dimensional classification , 1983, Biological Cybernetics.

[31] Vahab S. Mirrokni,et al. Sink equilibria and convergence , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[32] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[33] C. Hauert,et al. Coevolutionary dynamics: from finite to infinite populations. , 2004, Physical review letters.

[34] M. Nowak,et al. Stochastic dynamics of invasion and fixation. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35] L. Imhof,et al. Stochasticity and evolutionary stability. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36] Paul W. Goldberg,et al. The complexity of computing a Nash equilibrium , 2006, STOC '06.

[37] M. Nowak. Evolutionary Dynamics: Exploring the Equations of Life , 2006 .

[38] L. Barreira. Poincaré recurrence:. old and new , 2006 .

[39] Drew Fudenberg,et al. Imitation Processes with Small Mutations , 2004, J. Econ. Theory.

[40] Michael P. Wellman. Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[41] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[42] Michael P. Wellman,et al. Learning payoff functions in infinite games , 2005, Machine Learning.

[43] John M Alongi,et al. Recurrence and Topology , 2007 .

[44] Yannick Viossat,et al. The replicator dynamics does not lead to correlated equilibria , 2007, Games Econ. Behav..

[45] Simon Parsons,et al. What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..

[46] Bret Hoehn,et al. Effective short-term opponent exploitation in simplified poker , 2005, Machine Learning.

[47] Jens Christian Claussen,et al. Discrete stochastic processes, replicator and Fokker-Planck equations of coevolutionary dynamics in finite and infinite populations , 2008, ArXiv.

[48] Jan Ramon,et al. An evolutionary game-theoretic analysis of poker strategies , 2009, Entertain. Comput..

[49] D. Avis,et al. Enumeration of Nash equilibria for two-player games , 2010 .

[50] Christos H. Papadimitriou,et al. On Learning Algorithms for Nash Equilibria , 2010, SAGT.

[51] William H. Sandholm,et al. Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.

[52] Lars Magnus Hvattum,et al. Using ELO ratings for match result prediction in association football , 2010 .

[53] Paul W. Goldberg,et al. The Complexity of the Homotopy Method, Equilibrium Selection, and Lemke-Howson Solutions , 2010, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[54] Mohak Shah,et al. Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[55] Francisco C. Santos,et al. Co-evolution of pre-play signaling and cooperation , 2011, ECAL.

[56] Éva Tardos,et al. Beyond the Nash Equilibrium Barrier , 2011, ICS.

[57] Kenneth Dixon,et al. Introduction to Stochastic Modeling , 2011 .

[58] Asuman E. Ozdaglar,et al. Flows and Decompositions of Games: Harmonic and Potential Games , 2010, Math. Oper. Res..

[59] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60] F. C. Santos,et al. Emergence of fairness in repeated group interactions. , 2012, Physical review letters.

[61] Nathan R. Sturtevant,et al. A parameterized family of equilibrium profiles for three-player kuhn poker , 2013, AAMAS.

[62] C. Varin,et al. Dynamic Bradley–Terry modelling of sports tournaments , 2013 .

[63] Michael P. Wellman,et al. Analyzing Incentives for Protocol Compliance in Complex Domains: A Case Study of Introduction-Based Routing , 2013, ArXiv.

[64] Vincent Conitzer,et al. The Exact Computational Complexity of Evolutionarily Stable Strategies , 2013, WINE.

[65] E. Wagner. The Explanatory Relevance of Nash Equilibrium: One-Dimensional Chaos in Boundedly Rational Learning , 2013, Philosophy of Science.

[66] Tobias Galla,et al. Complex dynamics in learning complicated games , 2011, Proceedings of the National Academy of Sciences.

[67] Michael H. Bowling,et al. Using Response Functions to Measure Strategy Strength , 2014, AAAI.

[68] Marc Lanctot,et al. Further developments of extensive-form replicator dynamics using the sequence-form representation , 2014, AAMAS.

[69] Jeff S. Shamma,et al. Optimization Despite Chaos: Convex Relaxations to Complex Limit Sets via Poincaré Recurrence , 2014, SODA.

[70] Michael P. Wellman,et al. Empirical Game-Theoretic Analysis for Moving Target Defense , 2015, MTD@CCS.

[71] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.