Bayesian Policy Gradient Algorithms
暂无分享,去创建一个
[1] R. Wolpert,et al. Likelihood Principle , 2022, The SAGE Encyclopedia of Research Design.
[2] Anthony O'Hagan,et al. Monte Carlo is fundamentally unsound , 1987 .
[3] A. O'Hagan,et al. Bayes–Hermite quadrature , 1991 .
[4] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[6] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[7] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.
[8] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[9] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[10] Carl E. Rasmussen,et al. Bayesian Monte Carlo , 2002, NIPS.
[11] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[12] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[13] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2006 .
[14] Yaakov Engel,et al. Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .
[15] R. Sutton,et al. Actor-critic Algorithms 1. Policy Gradient Methods for Reinforcement Learning with Function Average Reward Td Actor-critic Algorithm Using Func- Tion Approximation , .