暂无分享,去创建一个
Shie Mannor | Balázs Szörényi | Gugan Thoppe | Gal Dalal | Shie Mannor | Gal Dalal | Gugan Thoppe | Balázs Szörényi
[1] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.
[2] V. Lakshmikantham,et al. Method of Variation of Parameters for Dynamic Systems , 1998 .
[3] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[4] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[5] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[6] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[7] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[8] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[9] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[10] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[11] T. Sideris. Ordinary Differential Equations and Dynamical Systems , 2013 .
[12] Nathaniel Korda,et al. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence , 2014, ICML.
[13] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[14] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[15] V. Borkar,et al. A Concentration Bound for Stochastic Approximation via Alekseev’s Formula , 2015, Stochastic Systems.