Finite Sample Analyses for TD(0) With Function Approximation
暂无分享,去创建一个
Shie Mannor | Balázs Szörényi | Gugan Thoppe | Gal Dalal | Shie Mannor | Gal Dalal | Gugan Thoppe | Balázs Szörényi
[1] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[2] M. Fathi,et al. Transport-Entropy inequalities and deviation estimates for stochastic approximation schemes , 2013, 1301.7740.
[4] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[5] Shie Mannor,et al. Concentration Bounds for Two Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, ArXiv.
[6] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[7] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[8] S. Menozzi,et al. Concentration bounds for stochastic approximations , 2012, 1204.3730.
[9] Martha White,et al. Accelerated Gradient Temporal Difference Learning , 2016, AAAI.
[10] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[11] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[12] T. Sideris. Ordinary Differential Equations and Dynamical Systems , 2013 .
[13] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[14] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[15] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[16] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[17] Noah Williams,et al. Stability and Long Run Equilibrium in Stochastic Fictitious Play , 2002 .
[18] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[19] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.
[20] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[21] Daniela Fischer. Differential Equations Dynamical Systems And An Introduction To Chaos , 2016 .
[22] Shie Mannor,et al. Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, COLT.
[23] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[24] V. Lakshmikantham,et al. Method of Variation of Parameters for Dynamic Systems , 1998 .
[25] Sameer Kamal,et al. On the Convergence, Lock-In Probability, and Sample Complexity of Stochastic Approximation , 2010, SIAM J. Control. Optim..
[26] Nathaniel Korda,et al. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence , 2014, ICML.
[27] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[28] Csaba Szepesvari,et al. Finite Time Bounds for Temporal Difference Learning with Function Approximation: Problems with some “state-of-the-art” results , 2017 .
[29] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[30] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[31] V. Borkar,et al. A Concentration Bound for Stochastic Approximation via Alekseev’s Formula , 2015, Stochastic Systems.
[32] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[33] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.