Two-Timescale Stochastic Approximation Convergence Rates with Applications to Reinforcement Learning
暂无分享,去创建一个
[1] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[2] V. Lakshmikantham,et al. Method of Variation of Parameters for Dynamic Systems , 1998 .
[3] T. Sideris. Ordinary Differential Equations and Dynamical Systems , 2013 .
[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[5] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[6] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[7] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.
[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[9] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[10] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[11] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[12] Nathaniel Korda,et al. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence , 2014, ICML.
[13] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[14] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[15] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[16] V. Borkar,et al. A Concentration Bound for Stochastic Approximation via Alekseev’s Formula , 2015, Stochastic Systems.