Efficient Value Function Approximation Using Regression Trees

Value function approximation is a problem central to reinforcement learning. Many applications of reinforcement learning have relied on neural network function approximators, which are very slow to train and require substantial parameter tweaking to obtain good performance. Other reinforcement learning studies have applied nearest neighbor and CMAC function ap-proximators, but these cannot scale to problems with many features, especially if some features are irrelevant. We describe initial work on a new function approximation method that uses regression trees to represent value functions. A novel aspect of our method is its error criterion , which combines three terms: the supervised training error, a Bellman error term, and an advantage error term. By using this composite error criterion, we are able to combine many of the beneets of tted value iteration, T D(0), and advantage updating. The new method is compared experimentally to previous work that employed T D() to solve job-shop scheduling problems (Zhang & Dietterich, 1996). The results show that the new method performs as well as the neural network method employed in that work, and that it can be trained in much less time. Our new method shows promise of providing a function approximator that is much more eecient and much easier to apply than neural network methods.