Learning a Partial Behavior for a Competitive Robotic Soccer Agent

Robotic soccer is a highly competitive domain. Accordingly, the use of learnt behaviors in this application field presumes not only learning algorithms that are known to converge and produce stable results, but also imposes the wish for obtaining optimal or at least near-optimal behaviors, even when working within high-dimensional and continuous state/action spaces. This paper deals with the continuous amelioration of adaptive soccer playing skills in robotic soccer simulation, documenting and presenting results of our hunt for optimal policies. We show that not too much effort is necessary to realize straightforward Reinforcement Learning algorithms in this domain, but that a heavy load of work is required when tweaking them towards competitiveness.

[1]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[2]  Martin A. Riedmiller,et al.  CBR for State Value Function Approximation in Reinforcement Learning , 2005, ICCBR.

[3]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[4]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[5]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[6]  Peter Stone,et al.  Progress in Learning 3 vs. 2 Keepaway , 2003, RoboCup.

[7]  Holger Schoener,et al.  Active Learning with Neural Networks , 2007 .

[8]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9]  Oliver Obst,et al.  Qualitative Velocity and Ball Interception , 2002, KI.

[10]  Hiroaki Kitano,et al.  RoboCup-2001: The Fifth Robotic Soccer World Championships , 2002, AI Mag..

[11]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[12]  Manuela M. Veloso,et al.  The CMUnited-99 Champion Simulator Team , 2000, AI Mag..

[13]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[14]  Hamidreza Chitsaz,et al.  The Fifth Robotic Soccer World Championships , 2002 .

[15]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[16]  李幼升,et al.  Ph , 1989 .