Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning
暂无分享,去创建一个
Shie Mannor | Balázs Szörényi | Gugan Thoppe | Gal Dalal | Shie Mannor | Gal Dalal | Gugan Thoppe | Balázs Szörényi
[1] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[2] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[3] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[4] H. Kushner. A projected stochastic approximation method for adaptive filters and identifiers , 1980 .
[5] Francis R. Bach,et al. Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions , 2014, AISTATS 2014.
[6] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[7] V. Lakshmikantham,et al. Method of Variation of Parameters for Dynamic Systems , 1998 .
[8] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[9] L. Gerencsér. Rate of convergence of moments of Spall's SPSA method , 1997, 1997 European Control Conference (ECC).
[10] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[11] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[12] A. Mokkadem,et al. Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms , 2006, math/0610329.
[13] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[14] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[15] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[16] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[17] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[18] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[19] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[20] V. Borkar,et al. A Concentration Bound for Stochastic Approximation via Alekseev’s Formula , 2015, Stochastic Systems.
[21] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[22] Daniela Fischer. Differential Equations Dynamical Systems And An Introduction To Chaos , 2016 .
[23] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[24] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .
[25] Shalabh Bhatnagar,et al. A stability criterion for two timescale stochastic approximation schemes , 2017, Autom..
[26] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[27] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[28] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[29] Nathaniel Korda,et al. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence , 2014, ICML.
[30] T. Sideris. Ordinary Differential Equations and Dynamical Systems , 2013 .