Incremental multi-step Q-learning
 Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
 Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
 C. Watkins. Learning from delayed rewards , 1989 .
 Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
 Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
 Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
 Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
 Michael I. Jordan,et al. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms , 1993, Neural Computation.
 Mark D. Pendrith. On Reinforcement Learning of Control Actions in Noisy and Non-Markovian Domains , 1994 .
 Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
 Pawel Cichosz,et al. Fast and Efficient Reinforcement Learning with Truncated Temporal Differences , 1995, ICML.
 Leslie Pack Kaelbling,et al. On reinforcement learning for robots , 1996, IROS.
 Peter Dayan,et al. Q-learning , 1992, Machine Learning.
 Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
 Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
 Peter Dayan,et al. Technical Note: Q-Learning , 1992, Machine Learning.
 Richard S. Sutton,et al. Learning to Predict by the Methods of Temporal Differences , 1988, Machine Learning.