POMDPs and Policy Gradients
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[2] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[3] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .
[4] Blai Bonet,et al. An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes , 2002, ICML.
[5] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[6] Katia P. Sycara,et al. Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State , 2001, ICML.
[7] Ronald E. Parr,et al. Solving Factored POMDPs with Linear Value Functions , 2001 .
[8] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.
[9] Andrew Tridgell,et al. KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.
[10] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.