Sequential decision making in large domains requires high computational expense. With the classical dynamic programming approach, a rising problem size soon leads to intractability because of time and memory constraints. This situation can be significantly remedied by using more advanced reinforcement learning techniques in combination with generalizing function approximators. However, this may lead to unstable learning behaviour as the strict convergence results are no longer valid. The paper presents an approach to stabilize learning by gradually reducing the search space for the optimal decision policy. This is done by iteratively adapting the action set according to the progress of learning. Experiments are described within the FYNESSE control architecture that is a framework for autonomously learning adaptive control strategies.
[1]
Richard S. Sutton,et al.
Neuronlike adaptive elements that can solve difficult learning control problems
,
1983,
IEEE Transactions on Systems, Man, and Cybernetics.
[2]
Michael I. Jordan,et al.
Learning to Control an Unstable System with Forward Modeling
,
1989,
NIPS.
[3]
Andrew W. Moore,et al.
Generalization in Reinforcement Learning: Safely Approximating the Value Function
,
1994,
NIPS.
[4]
Andrew G. Barto,et al.
Learning to Act Using Real-Time Dynamic Programming
,
1995,
Artif. Intell..
[5]
Sebastian Thrun,et al.
Issues in Using Function Approximation for Reinforcement Learning
,
1999
.