Temporal Difference Updating without a Learning Rate
暂无分享,去创建一个
[1] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[2] C. Watkins. Learning from delayed rewards , 1989 .
[3] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[4] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[6] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[7] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[10] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.