Eligibility Traces for Off-Policy Policy Evaluation
暂无分享,去创建一个
[1] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[2] M. J. Fryer,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[3] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[4] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[5] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[6] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[7] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[8] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.
[9] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[10] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.