Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
 Michael I. Jordan,et al. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms , 1993, Neural Computation.
 Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
 Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
 Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
 Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
 Richard S. Sutton,et al. Learning to Predict by the Methods of Temporal Differences , 1988, Machine Learning.
 Pawel Cichosz,et al. Fast and Efficient Reinforcement Learning with Truncated Temporal Differences , 1995, ICML.
 J. Peng,et al. Efficient learning and planning within the Dyna framework , 1993, IEEE International Conference on Neural Networks.