Technical Note: Q-Learning
暂无分享,去创建一个
[1] P. B. Coaker,et al. Applied Dynamic Programming , 1964 .
[2] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[3] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[4] MITSUO SATO,et al. Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..
[5] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[6] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .
[7] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[8] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[9] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[10] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[11] Richard S. Sutton,et al. Learning to Predict by the Methods of Temporal Differences , 1988, Machine Learning.