论文信息 - Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron. Based on the principle of storing and reusing transition experiences, a model-free, neural network based Reinforcement Learning algorithm is proposed. The method is evaluated on three benchmark problems. It is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.

Martin A. Riedmiller

[1] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[3] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[4] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[6] Martin A. Riedmiller. Concepts and Facilities of a Neural Reinforcement Learning Control Architecture for Technical Process Control , 1999, Neural Computing & Applications.

[7] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[8] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.

[9] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[10] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..