论文信息 - Adaptive importance sampling for value function approximation in off-policy reinforcement learning - 字舞流文

Adaptive importance sampling for value function approximation in off-policy reinforcement learning

Masashi Sugiyama | Jan Peters | Hirotaka Hachiya | Takayuki Akiyama | Jan Peters | Masashi Sugiyama | H. Hachiya | Takayuki Akiyama | Hirotaka Hachiya

[1] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[2] Klaus-Robert Müller,et al. Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[3] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] J. Franklin,et al. The elements of statistical learning: data mining, inference and prediction , 2005 .

[6] Motoaki Kawanabe,et al. Trading Variance Reduction with Unbiasedness: The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression , 2004, Neural Computation.

[7] M. Bugeja,et al. Non-linear swing-up and stabilizing control of an inverted pendulum system , 2003, The IEEE Region 8 EUROCON 2003. Computer as a Tool..

[8] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[9] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.

[10] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.

[11] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[12] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[13] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.

[14] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[15] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[16] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[17] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[18] Ronald L. Wasserstein,et al. Monte Carlo: Concepts, Algorithms, and Applications , 1997 .

[19] Kazuo Tanaka,et al. An approach to fuzzy control of nonlinear systems: stability and design issues , 1996, IEEE Trans. Fuzzy Syst..

[20] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[21] N. L. Johnson. Linear Statistical Inference and Its Applications , 1966 .