Efficient algorithms for learning to play repeated games against computationally bounded adversaries

We examine the problem of learning to play various games optimally against resource-bounded adversaries, with an explicit emphasis on the computational efficiency of the learning algorithm. We are especially interested in providing efficient algorithms for games other than penny-matching (in which payoff is received for matching the adversary's action in the current round), and for adversaries other than the classically studied finite automata. In particular, we examine games and adversaries for which the learning algorithm's past actions may strongly affect the adversary's future willingness to "cooperate" (that is, permit high payoff), and therefore require carefully planned actions on the part of the learning algorithm. For example, in the game we call contract, both sides play O or 1 on each round, but our side receives payoff only if we play 1 in synchrony with the adversary; unlike penny-matching, playing O in synchrony with the adversary pays nothing. The name of the game is derived from the example of signing a contract, which becomes valid only if both parties sign (play 1).

[1]  Leslie Pack Kaelbling,et al.  Inferring finite automata with stochastic output functions and an application to map learning , 1992, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[2]  A. Neyman Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .

[3]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[4]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[5]  E. Kalai Bounded Rationality and Strategic Complexity in Repeated Games , 1987 .

[6]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[7]  Ronald L. Rivest,et al.  Inference of finite automata using homing sequences , 1989, STOC '89.

[8]  Itzhak Gilboa,et al.  Bounded Versus Unbounded Rationality: The Tyranny of the Weak , 1989 .

[9]  Adi Shamir,et al.  IP = PSPACE , 1992, JACM.

[10]  Ronitt Rubinfeld,et al.  Learning fallible finite state automata , 1993, COLT '93.

[11]  Ronald L. Rivest,et al.  Inference of finite automata using homing sequences , 1989, STOC '89.

[12]  Mihalis Yannakakis,et al.  On complexity as bounded rationality (extended abstract) , 1994, STOC '94.

[13]  Lance Fortnow,et al.  Optimality and domination in repeated games with bounded players , 1993, STOC '94.

[14]  Vladimir Vovk,et al.  An optimal-control application of two paradigms of on-line learning , 1994, COLT '94.

[15]  Ronitt Rubinfeld,et al.  Exactly learning automata with small cover time , 1995, COLT '95.