Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
暂无分享,去创建一个
Peter L. Bartlett | Jonathan Baxter | Evan Greensmith | Evan Greensmith | P. Bartlett | Jonathan Baxter
[1] P. Prescott,et al. Monte Carlo Methods , 1964, Computational Statistical Physics.
[2] E. Seneta. Non-negative Matrices and Markov Chains (Springer Series in Statistics) , 1981 .
[3] G. Grimmett,et al. Probability and random processes , 2002 .
[4] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[5] Alan Weiss,et al. Sensitivity Analysis for Simulations via Likelihood Ratios , 1989, Oper. Res..
[6] R. Rubinstein. How to optimize discrete-event systems from a single sample path by the score function method , 1991 .
[7] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[8] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[11] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[12] Shigenobu Kobayashi,et al. Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward , 1995, ICML.
[13] P. Glynn,et al. Likelihood ratio gradient estimation for stochastic recursions , 1995, Advances in Applied Probability.
[14] Ronald L. Wasserstein,et al. Monte Carlo: Concepts, Algorithms, and Applications , 1997 .
[15] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.
[16] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[17] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.
[18] Shigenobu Kobayashi,et al. Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .
[19] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[20] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[21] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[22] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[23] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[24] Flemming Topsøe,et al. Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.
[25] Tim B. Swartz,et al. Approximating Integrals Via Monte Carlo and Deterministic Methods , 2000 .
[26] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[27] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[28] Peter L. Bartlett,et al. Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning , 2000, J. Comput. Syst. Sci..
[29] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[30] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[31] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[32] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[33] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.