Toward Off-Policy Learning Control with Function Approximation
暂无分享,去创建一个
Shalabh Bhatnagar | Richard S. Sutton | Csaba Szepesvári | Hamid Reza Maei | R. Sutton | H. Maei | Csaba Szepesvari | S. Bhatnagar
[1] Aleksej F. Filippov,et al. Differential Equations with Discontinuous Righthand Sides , 1988, Mathematics and Its Applications.
[2] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[3] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[4] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[6] A. Kruger. On Fréchet Subdifferentials , 2003 .
[7] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[8] A. Kruger. ON FR ´ ECHET SUBDIFFERENTIALS , 2003 .
[9] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] A. Michel,et al. Analysis and design of switched normal systems , 2006 .
[12] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[13] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[14] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[15] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[16] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[17] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[18] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[19] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[20] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[21] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .