α-Rank: Multi-Agent Evaluation by Evolution

We introduce α-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). α-Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of the correspondence we establish to the dynamical MCC solution concept when the underlying evolutionary model’s ranking-intensity parameter, α, is chosen to be large, which exactly forms the basis of α-Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley’s Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our α-Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that not only provide an overarching and unifying perspective of existing continuous- and discrete-time evolutionary evaluation models, but also reveal the formal underpinnings of the α-Rank methodology. We illustrate the method in canonical games and empirically validate it in several domains, including AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.

[1]  I. Bendixson Sur les courbes définies par des équations différentielles , 1901 .

[2]  S. Kakutani A generalization of Brouwer’s fixed point theorem , 1941 .

[3]  C. Conley Isolated Invariant Sets and the Morse Index , 1978 .

[4]  P. Taylor,et al.  Evolutionarily Stable Strategies and Game Dynamics , 1978 .

[5]  E. C. Zeeman,et al.  Population dynamics from game theory , 1980 .

[6]  E. Zeeman Dynamics of the evolution of animal conflicts , 1981 .

[7]  R. C. Evans,et al.  A Bayesian Analysis of Free Rider Metagames , 1982 .

[8]  John C. Harsanyi,et al.  Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .

[9]  J. Weibull,et al.  Strategy subsets closed under rational behavior , 1991 .

[10]  H. Young,et al.  The Evolution of Conventions , 1993 .

[11]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[12]  L. Shapley,et al.  Potential Games , 1994 .

[13]  J. Hofbauer,et al.  Fictitious Play, Shapley Polygons and the Replicator Equation , 1995 .

[14]  Douglas E. Norton The fundamental theorem of dynamical systems , 1995 .

[15]  B. Stengel,et al.  COMPUTING EQUILIBRIA FOR TWO-PERSON GAMES , 1996 .

[16]  J. Hofbauer Evolutionary dynamics for bimatrix games: A Hamiltonian system? , 1996, Journal of mathematical biology.

[17]  L. Shapley,et al.  REGULAR ARTICLEPotential Games , 1996 .

[18]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[19]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[20]  Herbert Gintis,et al.  Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction - Second Edition , 2009 .

[21]  Eizo Akiyama,et al.  Chaos in learning a simple two-person game , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  G. Tesauro,et al.  Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[23]  Bernhard von Stengel,et al.  Chapter 45 Computing equilibria for two-person games , 2002 .

[24]  R. Cressman Evolutionary Dynamics and Extensive Form Games , 2003 .

[25]  Rajarshi Das,et al.  Choosing Samples to Compute Heuristic-Strategy Nash Equilibrium , 2003, AMEC.

[26]  S. Hart,et al.  Uncoupled Dynamics Do Not Lead to Nash Equilibrium , 2003 .

[27]  M. Nowak,et al.  Evolutionary Dynamics of Biological Games , 2004, Science.

[28]  Immanuel M. Bomze,et al.  Lotka-Volterra equation and replicator dynamics: new issues in classification , 1995, Biological Cybernetics.

[29]  Kousha Etessami,et al.  The computational complexity of evolutionarily stable strategies , 2008, Int. J. Game Theory.

[30]  I. Bomze Lotka-Volterra equation and replicator dynamics: A two-dimensional classification , 1983, Biological Cybernetics.

[31]  Vahab S. Mirrokni,et al.  Sink equilibria and convergence , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[32]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[33]  C. Hauert,et al.  Coevolutionary dynamics: from finite to infinite populations. , 2004, Physical review letters.

[34]  M. Nowak,et al.  Stochastic dynamics of invasion and fixation. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  L. Imhof,et al.  Stochasticity and evolutionary stability. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[37]  M. Nowak Evolutionary Dynamics: Exploring the Equations of Life , 2006 .

[38]  L. Barreira Poincaré recurrence:. old and new , 2006 .

[39]  Drew Fudenberg,et al.  Imitation Processes with Small Mutations , 2004, J. Econ. Theory.

[40]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[41]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[42]  Michael P. Wellman,et al.  Learning payoff functions in infinite games , 2005, Machine Learning.

[43]  John M Alongi,et al.  Recurrence and Topology , 2007 .

[44]  Yannick Viossat,et al.  The replicator dynamics does not lead to correlated equilibria , 2007, Games Econ. Behav..

[45]  Simon Parsons,et al.  What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..

[46]  Bret Hoehn,et al.  Effective short-term opponent exploitation in simplified poker , 2005, Machine Learning.

[47]  Jens Christian Claussen,et al.  Discrete stochastic processes, replicator and Fokker-Planck equations of coevolutionary dynamics in finite and infinite populations , 2008, ArXiv.

[48]  Jan Ramon,et al.  An evolutionary game-theoretic analysis of poker strategies , 2009, Entertain. Comput..

[49]  D. Avis,et al.  Enumeration of Nash equilibria for two-player games , 2010 .

[50]  Christos H. Papadimitriou,et al.  On Learning Algorithms for Nash Equilibria , 2010, SAGT.

[51]  William H. Sandholm,et al.  Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.

[52]  Lars Magnus Hvattum,et al.  Using ELO ratings for match result prediction in association football , 2010 .

[53]  Paul W. Goldberg,et al.  The Complexity of the Homotopy Method, Equilibrium Selection, and Lemke-Howson Solutions , 2010, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[54]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[55]  Francisco C. Santos,et al.  Co-evolution of pre-play signaling and cooperation , 2011, ECAL.

[56]  Éva Tardos,et al.  Beyond the Nash Equilibrium Barrier , 2011, ICS.

[57]  Kenneth Dixon,et al.  Introduction to Stochastic Modeling , 2011 .

[58]  Asuman E. Ozdaglar,et al.  Flows and Decompositions of Games: Harmonic and Potential Games , 2010, Math. Oper. Res..

[59]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60]  F. C. Santos,et al.  Emergence of fairness in repeated group interactions. , 2012, Physical review letters.

[61]  Nathan R. Sturtevant,et al.  A parameterized family of equilibrium profiles for three-player kuhn poker , 2013, AAMAS.

[62]  C. Varin,et al.  Dynamic Bradley–Terry modelling of sports tournaments , 2013 .

[63]  Michael P. Wellman,et al.  Analyzing Incentives for Protocol Compliance in Complex Domains: A Case Study of Introduction-Based Routing , 2013, ArXiv.

[64]  Vincent Conitzer,et al.  The Exact Computational Complexity of Evolutionarily Stable Strategies , 2013, WINE.

[65]  E. Wagner The Explanatory Relevance of Nash Equilibrium: One-Dimensional Chaos in Boundedly Rational Learning , 2013, Philosophy of Science.

[66]  Tobias Galla,et al.  Complex dynamics in learning complicated games , 2011, Proceedings of the National Academy of Sciences.

[67]  Michael H. Bowling,et al.  Using Response Functions to Measure Strategy Strength , 2014, AAAI.

[68]  Marc Lanctot,et al.  Further developments of extensive-form replicator dynamics using the sequence-form representation , 2014, AAMAS.

[69]  Jeff S. Shamma,et al.  Optimization Despite Chaos: Convex Relaxations to Complex Limit Sets via Poincaré Recurrence , 2014, SODA.

[70]  Michael P. Wellman,et al.  Empirical Game-Theoretic Analysis for Moving Target Defense , 2015, MTD@CCS.

[71]  David Silver,et al.  Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[72]  Michael P. Wellman,et al.  Welfare Effects of Market Making in Continuous Double Auctions , 2015, AAMAS.

[73]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[74]  Connor Sullivan Improving Elo Rankings For Sports Experimenting on the English Premier League , 2015 .

[75]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[76]  Michael P. Wellman,et al.  Strategic Market Choice: Frequent Call Markets vs. Continuous Double Auctions for Fast and Slow Traders , 2015, EAI Endorsed Trans. Serious Games.

[77]  José Hernández-Orallo,et al.  Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement , 2017, Artificial Intelligence Review.

[78]  Carl Veller,et al.  Finite-population evolution with rare mutations in asymmetric games , 2015, J. Econ. Theory.

[79]  Christos H. Papadimitriou,et al.  From Nash Equilibria to Chain Recurrent Sets: Solution Concepts and Topology , 2016, ITCS.

[80]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[81]  Michael P. Wellman,et al.  Moving Target Defense against DDoS Attacks: An Empirical Game-Theoretic Analysis , 2016, MTD@CCS.

[82]  Georgios Piliouras,et al.  Average Case Performance of Replicator Dynamics in Potential Games via Computing Regions of Attraction , 2014, EC.

[83]  D. M. V. Hesteren Evolutionary Game Theory , 2017 .

[84]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[85]  José Hernández-Orallo,et al.  The Measure of All Minds: Evaluating Natural and Artificial Intelligence , 2017 .

[86]  D. Aldous Elo Ratings and the Sports Model: A Neglected Topic in Applied Probability? , 2017 .

[87]  Laura K. Hayward,et al.  The Red Queen and King in finite populations , 2017, Proceedings of the National Academy of Sciences.

[88]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[89]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[90]  Georgios Piliouras,et al.  Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos , 2017, NIPS.

[91]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[92]  Attila Szolnoki,et al.  Evolutionary dynamics of cooperation in neutral populations , 2017, ArXiv.

[93]  Michael P. Wellman,et al.  Multi-Stage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis , 2017, MTD@CCS.

[94]  Jason D. Hartline,et al.  SIGecom job market candidate profiles 2018 , 2017, SECO.

[95]  Michael P. Wellman,et al.  A Cloaking Mechanism to Mitigate Market Manipulation , 2018, IJCAI.

[96]  Joel Z. Leibo,et al.  A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.

[97]  Shane Legg,et al.  Symmetric Decomposition of Asymmetric Games , 2017, Scientific Reports.

[98]  Thore Graepel,et al.  Re-evaluating evaluation , 2018, NeurIPS.

[99]  Daniel Memmert,et al.  The Betting Odds Rating System: Using soccer forecasts to forecast soccer , 2018, PloS one.

[100]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[101]  Leonard J. Schulman,et al.  Learning Dynamics and the Co-Evolution of Competing Sexual Species , 2017, ITCS.

[102]  Matjaz Perc,et al.  Evolutionary dynamics in the public goods games with switching between punishment and exclusion , 2018, Chaos.

[103]  Csaba Szepesvári,et al.  Bounds and dynamics for empirical game theoretic analysis , 2019, Autonomous Agents and Multi-Agent Systems.

[104]  Georgios Piliouras,et al.  Game dynamics as the meaning of a game , 2019, SECO.

[105]  Guy Lever,et al.  Emergent Coordination Through Competition , 2019, ICLR.