论文信息 - SIMULTANEOUS STATE ESTIMATION AND LEARNING IN REPEATED COURNOT GAMES

SIMULTANEOUS STATE ESTIMATION AND LEARNING IN REPEATED COURNOT GAMES

The aim of this article is to propose that an intelligent agent can be able to decide properly in an incomplete information repeated Cournot game. The market model and the competitors’ decision models are not known to the players. The proposed agent employs a combination of the k-nearest neighbor (KNN) method and the Bayes classifier to predict the next action of its rivals, using the market decision history. The agent takes the predicted actions as an estimate of its next state and learns the expected payoff of its state-action pairs interactively using the reinforcement learning (RL) algorithm. The results of the proposed agent's competition with two benchmark competitors in different simulated Cournot games are presented. The simulation results show that the proposed agent can significantly earn more payoffs in comparison with the two benchmark agents.

Majid Nili Ahmadabadi | Hamed Kebriaei | Ashkan Rahimi-Kian

[1] L. Buşoniu,et al. A comprehensive survey of multi-agent reinforcement learning , 2011 .

[2] Colin Camerer,et al. Experience‐weighted Attraction Learning in Normal Form Games , 1999 .

[3] A. Roth,et al. Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[4] M. L. Baughman,et al. An Empirical Study of Applied Game Theory: Transmission Constrained Cournot Behavior , 2002, IEEE Power Engineering Review.

[5] R. D. Theocharis. On the Stability of the Cournot Solution on the Oligopoly Problem , 1960 .

[6] Damien Ernst,et al. A comparison of Nash equilibria analysis and agent-based modelling for power markets , 2006 .

[7] R. Sarin,et al. Payoff Assessments without Probabilities: A Simple Dynamic Model of Choice , 1999 .

[8] Riyaz Sikora,et al. Learning bidding strategies with autonomous agents in environments with unstable equilibrium , 2008, Decis. Support Syst..

[9] Uzay Kaymak,et al. Q-learning agents in a Cournot oligopoly model , 2008 .

[10] Weihong Huang,et al. Theory of Adaptive Adjustment , 2000 .

[11] Farshid Vahid,et al. Predicting How People Play Games: A Simple Dynamic Model of Choice , 2001, Games Econ. Behav..