Multi-agent learning for engineers

As suggested by the title of Shoham, Powers, and Grenager's position paper [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Intelligence 171 (7) (2007) 365-377, this issue], the ultimate lens through which the multi-agent learning framework should be assessed is ''what is the question?''. In this paper, we address this question by presenting challenges motivated by engineering applications and discussing the potential appeal of multi-agent learning to meet these challenges. Moreover, we highlight various differences in the underlying assumptions and issues of concern that generally distinguish engineering applications from models that are typically considered in the economic game theory literature.

[1]  Jeff S. Shamma,et al.  Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria , 2005, IEEE Transactions on Automatic Control.

[2]  Michael L. Littman,et al.  Graphical Models for Game Theory , 2001, UAI.

[3]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[4]  R. Akella,et al.  Optimal control of production rate in a failure prone manufacturing system , 1985, 1985 24th IEEE Conference on Decision and Control.

[5]  Sham M. Kakade,et al.  Deterministic calibration and Nash equilibrium , 2004, J. Comput. Syst. Sci..

[6]  Robert Murphey,et al.  Target-Based Weapon Target Assignment Problems , 2000 .

[7]  T. Başar Control theory : twenty-five seminal papers (1932-1981) , 2001 .

[8]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[9]  Ram Akella,et al.  Optimal control of production rate in a failure prone manufacturing system , 1985 .

[10]  Eitan Altman,et al.  Individual Equilibrium and Learning in Processor Sharing Systems , 1998, Oper. Res..

[11]  J. Hofbauer,et al.  Uncoupled Dynamics Do Not Lead to Nash Equilibrium , 2003 .

[12]  Timothy W. McLain,et al.  Coordinated target assignment and intercept for unmanned air vehicles , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[13]  Stanley B. Gershwin,et al.  An algorithm for the computer control of a flexible manufacturing system , 1983 .

[14]  K. Arrow Rationality of Self and Others in an Economic System , 1986 .

[15]  L. Samuelson Evolutionary Games and Equilibrium Selection , 1997 .

[16]  R. Vohra,et al.  Calibrated Learning and Correlated Equilibrium , 1996 .

[17]  P. R. Kumar,et al.  Re-entrant lines , 1993, Queueing Syst. Theory Appl..

[18]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[19]  Jason R. Marden,et al.  Autonomous Vehicle-Target Assignment: A Game-Theoretical Formulation , 2007 .

[20]  H. Peyton Young,et al.  Strategic Learning and Its Limits , 2004 .

[21]  Shie Mannor,et al.  Online calibrated forecasts: Memory efficiency versus universality for learning in games , 2006, Machine Learning.

[22]  Eitan Altman,et al.  A survey on networking games in telecommunications , 2006, Comput. Oper. Res..

[23]  Frank Kelly,et al.  Charging and rate control for elastic traffic , 1997, Eur. Trans. Telecommun..

[24]  Stanley B. Gershwin,et al.  Manufacturing Systems Engineering , 1993 .

[25]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[26]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[27]  Ariel Orda,et al.  Competitive routing in multi-user communication networks , 1993, IEEE INFOCOM '93 The Conference on Computer Communications, Proceedings.

[28]  Andreu Mas-Colell,et al.  Stochastic Uncoupled Dynamics and Nash Equilibrium , 2004, Games Econ. Behav..

[29]  Richard J. La,et al.  Optimal routing control: repeated game approach , 2002, IEEE Trans. Autom. Control..

[30]  Shie Mannor,et al.  The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..

[31]  H. Peyton Young,et al.  Learning, hypothesis testing, and Nash equilibrium , 2003, Games Econ. Behav..

[32]  Panos M. Pardalos,et al.  Nonlinear assignment problems : algorithms and applications , 2000 .

[33]  Ariel Orda,et al.  Competitive routing in multiuser communication networks , 1993, TNET.

[34]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[35]  S. Hart Adaptive Heuristics , 2005 .

[36]  Tim Roughgarden,et al.  Selfish routing and the price of anarchy , 2005 .

[37]  Vivek S. Borkar,et al.  Dynamic Cesaro-Wardrop equilibration in networks , 2003, IEEE Trans. Autom. Control..

[38]  Krishna C. Jha,et al.  Exact and Heuristic Methods for the Weapon Target Assignment Problem , 2003 .

[39]  Colin Camerer Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[40]  H. Young Individual Strategy and Social Structure , 2020 .