CBR for State Value Function Approximation in Reinforcement Learning

CBR is one of the techniques that can be applied to the task of approximating a function over high-dimensional, continuous spaces. In Reinforcement Learning systems a learning agent is faced with the problem of assessing the desirability of the state it finds itself in. If the state space is very large and/or continuous the availability of a suitable mechanism to approximate a value function – which estimates the value of single states – is of crucial importance. In this paper, we investigate the use of case-based methods to realise that task. The approach we take is evaluated in a case study in robotic soccer simulation.

[1]  Lawrence Davis,et al.  A Hybrid Genetic Algorithm for Classification , 1991, IJCAI.

[2]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[3]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[4]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[5]  Jing Peng,et al.  Efficient Memory-Based Dynamic Programming , 1995, ICML.

[6]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7]  Luc Lamontagne,et al.  Case-Based Reasoning Research and Development , 1997, Lecture Notes in Computer Science.

[8]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[9]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[10]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[11]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[12]  Hans-Dieter Burkhard,et al.  AT Humboldt in RoboCup-99 , 1999, RoboCup.

[13]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[14]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[15]  Oliver Obst,et al.  Qualitative Velocity and Ball Interception , 2002, KI.

[16]  Hamidreza Chitsaz,et al.  The Fifth Robotic Soccer World Championships , 2002 .

[17]  Hiroaki Kitano,et al.  RoboCup-2001: The Fifth Robotic Soccer World Championships , 2002, AI Mag..

[18]  Jeffrey M. Forbes,et al.  Representations for learning control policies , 2002 .

[19]  Armin Stahl,et al.  Using Evolution Programs to Learn Local Similarity Measures , 2003, ICCBR.

[20]  Kurt Driessens,et al.  Relational Instance Based Regression for Relational Reinforcement Learning , 2003, ICML.

[21]  Peter Stone,et al.  Progress in Learning 3 vs. 2 Keepaway , 2003, RoboCup.

[22]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[23]  Doina Precup,et al.  Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[24]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.