Fast gradient-descent methods for temporal-difference learning with linear function approximation
暂无分享,去创建一个
Shalabh Bhatnagar | Doina Precup | Richard S. Sutton | Csaba Szepesvári | David Silver | Hamid Reza Maei | Eric Wiewiora | R. Sutton | D. Silver | H. Maei | Csaba Szepesvari | S. Bhatnagar | Doina Precup | Eric Wiewiora | David Silver | Eric Wiewiora
[1] Morris W. Hirsch,et al. Convergent activation dynamics in continuous time networks , 1989, Neural Networks.
[2] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[3] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[4] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[5] V. Borkar. Stochastic approximation with two time scales , 1997 .
[6] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[7] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[8] R. Sutton,et al. Off-policy Learning with Recognizers , 2000 .
[9] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[10] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[11] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[12] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[13] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[14] Doina Precup,et al. Off-policy Learning with Options and Recognizers , 2005, NIPS.
[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[16] Nathan R. Sturtevant,et al. Feature Construction for Reinforcement Learning in Hearts , 2006, Computers and Games.
[17] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[18] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[19] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[20] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[21] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[22] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.