Approximate model-assisted Neural Fitted Q-Iteration

In this work, we propose an extension to the Neural Fitted Q-Iteration algorithm that utilizes a learned model to generate virtual trajectories which are used for updating the Q-function. Compared to standard NFQ, this combination has the potential to greatly reduce the amount of system interaction required to learn a good policy. At the same time, the approach still maintains the generalization ability of Q-learning. We provide a general formulation for approximate model-assisted fitted Q-learning, and examine the advantages of its neural implementation regarding interaction time and robustness. Its capabilities are illustrated with first results on a benchmark cart-pole regulation task, on which our method turns out to provide more general policies using much less interaction time.