XCS with computed prediction in continuous multistep environments

We apply XCS with computed prediction (XCSF) to tackle multistep reinforcement learning problems involving continuous inputs. In essence we use XCSF as a method of generalized reinforcement learning. We show that in domains involving continuous inputs and delayed rewards XCSF can evolve compact populations of accurate maximally general classifiers which represent the optimal solution to the target problem. We compare the performance of XCSF with that of tabular Q-learning adapted to the continuous domains considered here. The results we present show that XCSF can converge much faster than tabular techniques while producing more compact solutions. Our results also suggest that when exploration is less effective in some areas of the problem space, XCSF can exploit effective generalizations to extend the evolved knowledge beyond the frequently explored areas. In contrast, in the same situations, the convergence speed of tabular Q-learning worsens.

[1]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[2]  Martin V. Butz,et al.  An algorithmic description of XCS , 2000, Soft Comput..

[3]  Doina Precup,et al.  A Convergent Form of Approximate Policy Iteration , 2002, NIPS.

[4]  Daniele Loiacono,et al.  XCS with computed prediction for the learning of Boolean functions , 2005, 2005 IEEE Congress on Evolutionary Computation.

[5]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[6]  Daniele Loiacono,et al.  XCS with computed prediction in multistep environments , 2005, GECCO '05.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Daniele Loiacono,et al.  Extending XCSF beyond linear approximation , 2005, GECCO '05.

[9]  Martin V. Butz,et al.  An Algorithmic Description of XCS , 2000, IWLCS.

[10]  Stewart W. Wilson Classifier Systems for Continuous Payoff Environments , 2004, GECCO.

[11]  Daniele Loiacono,et al.  Generalization in the XCSF Classifier System: Analysis, Improvement, and Extension , 2007, Evolutionary Computation.

[12]  Stewart W. Wilson Function approximation with a classifier system , 2001 .

[13]  Stewart W. Wilson Classifiers that approximate functions , 2002, Natural Computing.

[14]  Stewart W. Wilson,et al.  XCS with Computable Prediction in Multistep Environments , 2005 .

[15]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[16]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[17]  Stewart W. Wilson Mining Oblique Data with XCS , 2000, IWLCS.

[18]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[19]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.