Reinforcement Learning in Continuous Action Spaces
暂无分享,去创建一个
[1] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[6] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[7] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[9] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[10] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[11] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.