暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[3] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[4] Philippe Preux,et al. Basis Expansion in Natural Actor Critic Methods , 2008, EWRL.
[5] V. Borkar. Stochastic approximation with two time scales , 1997 .
[6] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[7] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] E. J. Collins,et al. Convergent multiple-timescales reinforcement learning algorithms in normal form games , 2003 .
[10] Philippe Preux,et al. Recent Advances in Reinforcement Learning: 8th European Workshop, EWRL 2008, Villeneuve d'Ascq, France, June 30-July 3, 2008, Revised and Selected Papers , 2008 .
[11] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .
[12] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[13] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[14] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[17] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[18] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[19] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[20] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[21] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[22] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[23] Bart De Schutter,et al. Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[24] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[25] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .
[26] A. Mokkadem,et al. Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms , 2006, math/0610329.