Learning from Scarce Experience
暂无分享,去创建一个
[1] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[2] Peter W. Glynn,et al. Optimization of stochastic systems , 1986, WSC '86.
[3] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .
[4] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[5] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[6] P. Glynn. Importance sampling for markov chains: asymptotics for the variance , 1994 .
[7] P. Marbach. Simulation-Based Methods for Markov Decision Processes , 1998 .
[8] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[9] Neri Merhav,et al. Universal Prediction , 1998, IEEE Trans. Inf. Theory.
[10] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[11] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[12] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[13] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[14] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[15] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[16] Peter W. Glynn,et al. Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice , 2000, NIPS.
[17] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[18] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[19] Nicolas Meuleau,et al. Exploration in Gradient-Based Reinforcement Learning , 2001 .
[20] Leonid Peshkin,et al. Bounds on Sample Size for Policy Evaluation in Markov Environments , 2001, COLT/EuroCOLT.
[21] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[22] Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .
[23] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.