Policy Improvement for POMDPs Using Normalized Importance Sampling
暂无分享,去创建一个
[1] T. Kloek,et al. Bayesian estimates of equation system parameters, An application of integration by Monte Carlo , 1976 .
[2] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[3] J. Geweke,et al. Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .
[4] T. Hesterberg. Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .
[5] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[6] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[7] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[8] Nicolas Meuleau,et al. Exploration in Gradient-Based Reinforcement Learning , 2001 .
[9] Leonid Peshkin,et al. Bounds on Sample Size for Policy Evaluation in Markov Environments , 2001, COLT/EuroCOLT.
[10] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[11] William H. Press,et al. Numerical recipes in C , 2002 .