Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
暂无分享,去创建一个
Junichiro Yoshimoto | Jan Peters | Kenji Doya | Tetsuro Morimura | Eiji Uchibe | Jan Peters | K. Doya | Tetsuro Morimura | E. Uchibe | J. Yoshimoto
[1] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[2] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[3] Junichiro Yoshimoto,et al. A New Natural Policy Gradient by Stationary Distribution Metric , 2008, ECML/PKDD.
[4] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[5] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[8] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[9] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[10] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[11] Peter C. Young,et al. Recursive Estimation and Time-Series Analysis: An Introduction , 1984 .
[12] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[13] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.
[14] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .
[15] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[16] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[17] Peter Sollich,et al. Advances in neural information processing systems 11 , 1999 .
[18] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[19] Tao Wang,et al. Stable Dual Dynamic Programming , 2007, NIPS.
[20] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[21] Yuhong Yang,et al. Information Theory, Inference, and Learning Algorithms , 2005 .
[22] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[23] Kenji Doya,et al. Natural actor-critic with baseline adjustment for variance reduction , 2008, Artificial Life and Robotics.
[24] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[25] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[26] Motoaki Kawanabe,et al. A semiparametric statistical approach to model-free policy evaluation , 2008, ICML '08.
[27] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[28] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.
[29] Junichiro Yoshimoto,et al. A Generalized Natural Actor-Critic Algorithm , 2009, NIPS.
[30] K. Doya,et al. Policy gradient reinforcement learning with log stationary distribution gradients , 2007 .
[31] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[32] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[33] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[34] John N. Tsitsiklis,et al. On Average Versus Discounted Reward Temporal-Difference Learning , 2002, Machine Learning.
[35] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[36] R. Rubinstein. How to optimize discrete-event systems from a single sample path by the score function method , 1991 .
[37] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[38] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[39] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[40] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[41] Tao Wang,et al. Dual Representations for Dynamic Programming and Reinforcement Learning , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[42] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.