SIMULTANEOUS STATE ESTIMATION AND LEARNING IN REPEATED COURNOT GAMES

The aim of this article is to propose that an intelligent agent can be able to decide properly in an incomplete information repeated Cournot game. The market model and the competitors’ decision models are not known to the players. The proposed agent employs a combination of the k-nearest neighbor (KNN) method and the Bayes classifier to predict the next action of its rivals, using the market decision history. The agent takes the predicted actions as an estimate of its next state and learns the expected payoff of its state-action pairs interactively using the reinforcement learning (RL) algorithm. The results of the proposed agent's competition with two benchmark competitors in different simulated Cournot games are presented. The simulation results show that the proposed agent can significantly earn more payoffs in comparison with the two benchmark agents.

[1]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .

[2]  Colin Camerer,et al.  Experience‐weighted Attraction Learning in Normal Form Games , 1999 .

[3]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[4]  M. L. Baughman,et al.  An Empirical Study of Applied Game Theory: Transmission Constrained Cournot Behavior , 2002, IEEE Power Engineering Review.

[5]  R. D. Theocharis On the Stability of the Cournot Solution on the Oligopoly Problem , 1960 .

[6]  Damien Ernst,et al.  A comparison of Nash equilibria analysis and agent-based modelling for power markets , 2006 .

[7]  R. Sarin,et al.  Payoff Assessments without Probabilities: A Simple Dynamic Model of Choice , 1999 .

[8]  Riyaz Sikora,et al.  Learning bidding strategies with autonomous agents in environments with unstable equilibrium , 2008, Decis. Support Syst..

[9]  Uzay Kaymak,et al.  Q-learning agents in a Cournot oligopoly model , 2008 .

[10]  Weihong Huang,et al.  Theory of Adaptive Adjustment , 2000 .

[11]  Farshid Vahid,et al.  Predicting How People Play Games: A Simple Dynamic Model of Choice , 2001, Games Econ. Behav..

[12]  Mohammad Bagher Naghibi-Sistani,et al.  Application of Q-learning with temperature variation for bidding strategies in market based power systems , 2006 .

[13]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[14]  I. Erev,et al.  LEARNING STRATEGIES , 2010 .

[15]  Gian Italo Bischi,et al.  Equilibrium selection in a nonlinear duopoly game with adaptive expectations , 2001 .

[16]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[17]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[18]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Koji Okuguchi,et al.  Expectations and stability in oligopoly models , 1976 .

[20]  Jens Josephson,et al.  A numerical analysis of the evolutionary stability of learning rules , 2008 .

[21]  William H. Press,et al.  Numerical Recipes in C The Art of Scientific Computing , 1995 .

[22]  Maria Bigoni,et al.  Information and Learning in Oligopoly: An Experiment , 2009, Games Econ. Behav..

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Nicolaas J. Vriend,et al.  Evolving market structure: An ACE model of price dispersion and loyalty , 2001 .

[25]  Franklin M. Fisher,et al.  The Stability of the Cournot Oligopoly Solution: The Effects of Speeds of Adjustment and Increasing Marginal Costs , 1961 .

[26]  Gian Italo Bischi,et al.  Global Analysis of a Dynamic Duopoly Game with Bounded Rationality , 2000 .