Temporal difference learning

This chapter contains sections titled: TD Prediction, Advantages of TD Prediction Methods, Optimality of TD(0), Sarsa: On-Policy TD Control, Q-Learning: Off-Policy TD Control, Actor-Critic Methods, R-Learning for Undiscounted Continuing Tasks, Games, Afterstates, and Other Special Cases, Summary, Bibliographical and Historical Remarks