Reinforcement Learning: Past, Present and Future
暂无分享,去创建一个
[1] C. Watkins. Learning from delayed rewards , 1989 .
[2] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[3] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[4] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
[5] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[6] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[9] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[10] R. Sutton. Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .
[11] Doina Precup,et al. Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .
[12] Jonathan Baxter. KnightCap : A chess program that learns by combining TD ( ) with game-tree search , 1998 .
[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[14] Andrew Tridgell,et al. KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.
[15] Simon Haykin,et al. A dynamic channel assignment policy through Q-learning , 1999, IEEE Trans. Neural Networks.