Adaptive Importance Sampling with Automatic Model Selection in Value Function Approximation
暂无分享,去创建一个
Masashi Sugiyama | Jan Peters | Hirotaka Hachiya | Takayuki Akiyama | Jan Peters | Masashi Sugiyama | H. Hachiya | Takayuki Akiyama
[1] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[2] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[3] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[4] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[5] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[6] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[7] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[8] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[9] Klaus-Robert Müller,et al. Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..
[10] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[11] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Ronald L. Wasserstein,et al. Monte Carlo: Concepts, Algorithms, and Applications , 1997 .
[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .