Reinforcement Learning for RoboCup Soccer Keepaway

RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our application of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, “the keepers,” tries to keep control of the ball for as long as possible despite the efforts of “the takers.” The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.

[1]  V. Korolyuk,et al.  Semi-markov processes and their applications , 1975 .

[2]  R.M. Dunn,et al.  Brains, behavior, and robotics , 1983, Proceedings of the IEEE.

[3]  C. Watkins Learning from delayed rewards , 1989 .

[4]  Hyongsuk Kim,et al.  CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Thomas Dean,et al.  Reinforcement Learning for Planning and Control , 1993 .

[7]  Magnus Borga,et al.  Hierarchical Reinforcement Learning , 1993 .

[8]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[9]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[12]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[13]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[14]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[15]  Hitoshi Matsubara,et al.  Learning Cooperative Behavior in Multi-Agent Environment - A Case Study of Choice of Play-Plans in Soccer , 1996, PRICAI.

[16]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[17]  Hiroaki Kitano,et al.  The RoboCup Synthetic Agent Challenge 97 , 1997, IJCAI.

[18]  Michael Wooldridge,et al.  Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, IJCAI 97, Nagoya, Japan, August 23-29, 1997, 2 Volumes , 1997, IJCAI.

[19]  James A. Hendler,et al.  Co-evolving Soccer Softbot Team Coordination with Genetic Programming , 1997, RoboCup.

[20]  Tomohito Andou,et al.  Refinement of Soccer Agents' Positions Using Reinforcement Learning , 1997, RoboCup.

[21]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[22]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[23]  Astro Teller,et al.  Evolving Team Darwin United , 1998, RoboCup.

[24]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[25]  Manuela M. Veloso,et al.  Team-Partitioned, Opaque-Transition Reinforced Learning , 1998, RoboCup.

[26]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[27]  Peter Stone,et al.  Anticipation as a key for collaboration in a team of agents: a case study in robotic soccer , 1999, Optics East.

[28]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[29]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[30]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[31]  内部 英治,et al.  Cooperative behavior acquisition by learning and evolution in a multi-agent environment for mobile robots , 1999 .

[32]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[33]  浅田 稔,et al.  RoboCup-98 : Robot Soccer World Cup II , 1999 .

[34]  Peter Stone,et al.  Keeping the Ball from CMUnited-99 , 2000, RoboCup.

[35]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[36]  Reinforcement Learning for 3 vs. 2 Keepaway , 2000, RoboCup.

[37]  Geoffrey J. Gordon Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[38]  Peter Stone,et al.  Layered learning in multiagent systems - a winning approach to robotic soccer , 2000, Intelligent robotics and autonomous agents.

[39]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[40]  David Andre,et al.  Programmable Reinforcement Learning Agents , 2000, NIPS.

[41]  Peter Stone,et al.  RoboCup 2000: Robot Soccer World Cup IV , 2001, RoboCup.

[42]  Peter Stone,et al.  An architecture for action selection in robotic soccer , 2001, AGENTS '01.

[43]  Peter Stone,et al.  Keepaway Soccer: A Machine Learning Testbed , 2001, RoboCup.

[44]  Minoru Asada,et al.  Evolution for behavior selection accelerated by activation/termination constraints , 2001 .

[45]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[46]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[47]  Harukazu Igarashi,et al.  Robo Cup 2000: Robot Soccer World Cup IV , 2001 .

[48]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[49]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[50]  Steven M. Gustafson,et al.  Genetic Programming And Multi-agent Layered Learning By Reinforcements , 2002, GECCO.

[51]  R. Lyndon While,et al.  Learning In RoboCup Keepaway Using Evolutionary Algorithms , 2002, GECCO.

[52]  Raúl Rojas,et al.  RoboCup 2002: Robot Soccer World Cup VI , 2002, Lecture Notes in Computer Science.

[53]  Doina Precup,et al.  A Convergent Form of Approximate Policy Iteration , 2002, NIPS.

[54]  Shimon Whiteson,et al.  Concurrent layered learning , 2003, AAMAS '03.

[55]  Martin Riedmiller,et al.  Brainstormers 2003 - Team Description , 2003 .

[56]  Yaser Al-Onaizan,et al.  Experiences Acquired in the Design of RoboCup Teams: A Comparison of Two Fielded Teams , 2001, Autonomous Agents and Multi-Agent Systems.

[57]  Peter Stone,et al.  Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[58]  Jean-Arcady Meyer,et al.  Adaptive Behavior , 2005 .

[59]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[60]  Manuela Veloso,et al.  Reinforcement learning in the robocup-soccer keepaway , 2007 .