暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] F. Downton. Stochastic Approximation , 1969, Nature.
[3] M. T. Wasan. Stochastic Approximation , 1969 .
[4] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[5] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[6] V. Borkar. Stochastic approximation with two time scales , 1997 .
[7] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[8] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[9] Sebastian Thrun,et al. Simultaneous localization and mapping with unknown data association using FastSLAM , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).
[10] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[11] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[12] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[13] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[16] H. Robbins. A Stochastic Approximation Method , 1951 .
[17] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[18] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[19] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[20] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[21] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[22] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[23] Richard S. Sutton,et al. True online TD(λ) , 2014, ICML 2014.
[24] R. Sutton,et al. A new Q ( � ) with interim forward view and Monte Carlo equivalence , 2014 .
[25] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[26] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[27] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[28] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..