论文信息 - GQ($\lambda$) Quick Reference and Implementation Guide

GQ($\lambda$) Quick Reference and Implementation Guide

This document should serve as a quick reference for and guide to the implementation of linear GQ($\lambda$), a gradient-based off-policy temporal-difference learning algorithm. Explanation of the intuition and theory behind the algorithm are provided elsewhere (e.g., Maei & Sutton 2010, Maei 2011). If you questions or concerns about the content in this document or the attached java code please email Adam White (adam.white@ualberta.ca). The code is provided as part of the source files in the arXiv submission.

Richard S. Sutton | Adam White | R. Sutton | Adam White

[1] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..

[3] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.

[4] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .

[5] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.