Model-based reinforcement learning for evolving soccer strategies

We use reinforcement learning (RL) to evolve soccer team strategies. RL may profit significantly from world models (WMs). In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. In this chapter, we show that incomplete WMs can help to quickly find good policies. Our approach is based on a novel combination of CMACs and prioritized sweeping. Variants thereof outperform other algorithms used in previous work.

[1]  Riccardo Poli,et al.  New ideas in optimization , 1999 .

[2]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[3]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[4]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[5]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[6]  Bernd Fritzke Supervised Learning with Growing Cell Structures , 1993, NIPS.

[7]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[8]  Terrence J. Sejnowski,et al.  Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[9]  Jürgen Schmidhuber,et al.  Evolving Soccer Strategies , 1997, ICONIP.

[10]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[11]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[12]  W. Vent,et al.  Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .

[13]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[14]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[15]  Rafal Salustowicz,et al.  Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[16]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[17]  TesauroGerald Practical Issues in Temporal Difference Learning , 1992 .

[18]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[19]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[20]  Jürgen Schmidhuber,et al.  On Learning Soccer Strategies , 1997, ICANN.

[21]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[22]  Andrew Tridgell,et al.  KnightCap: A chess program that learns by combining TD( ) with minimax search , 1997, ICML 1997.

[23]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[24]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[25]  John R. Koza,et al.  Genetic evolution and co-evolution of computer programs , 1991 .

[26]  Ronald J. Williams,et al.  Incremental Multi-Step , 1996 .

[27]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[28]  李幼升,et al.  Ph , 1989 .

[29]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[30]  Jürgen Schmidhuber,et al.  From probabilities to programs with probabilistic incremental program evolution , 1999 .

[31]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[32]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[33]  SchmidhuberJürgen,et al.  Learning Team Strategies , 1998 .

[34]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[35]  Jürgen Schmidhuber,et al.  Efficient model-based exploration , 1998 .

[36]  Peter Stone,et al.  CMUnited-98: RoboCup-98 Simulator World Champion Team , 1999 .

[37]  Jordan B. Pollack,et al.  Why did TD-Gammon Work? , 1996, NIPS.

[38]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[39]  Hans J. Berliner,et al.  Experiences in Evaluation with BKG - A Program that Plays Backgammon , 1977, IJCAI.

[40]  Marco Wiering,et al.  Explorations in efficient reinforcement learning , 1999 .

[41]  Jürgen Schmidhuber,et al.  Probabilistic Incremental Program Evolution: Stochastic Search Through Program Space , 1997, ECML.

[42]  Juergen Schmidhuber,et al.  On learning how to learn learning strategies , 1994 .

[43]  Sebastian Thrun,et al.  Learning to Play the Game of Chess , 1994, NIPS.

[44]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[45]  Manuela M. Veloso,et al.  The CMUnited-98 Champion Simulator Team , 1998, RoboCup.

[46]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[47]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .