暂无分享,去创建一个
[1] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.
[2] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[3] W. Rudin. Principles of mathematical analysis , 1964 .
[4] Masashi Sugiyama,et al. Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning , 2011, Neural Computation.
[5] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[8] Yoshinobu Kawahara,et al. Weighted Likelihood Policy Search with Model Selection , 2012, NIPS.
[9] Masashi Sugiyama,et al. Efficient Sample Reuse in EM-Based Policy Search , 2009, ECML/PKDD.
[10] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[11] Tom Schaul,et al. Episodic Reinforcement Learning by Logistic Reward-Weighted Regression , 2008, ICANN.
[12] R. Bass. Convergence of probability measures , 2011 .
[13] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[14] C. Malsburg. Self-organization of orientation sensitive cells in the striate cortex , 2004, Kybernetik.
[15] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[16] Stefan Schaal,et al. Learning to Control in Operational Space , 2008, Int. J. Robotics Res..
[17] R. L. Stratonovich. CONDITIONAL MARKOV PROCESSES , 1960 .
[18] Masashi Sugiyama,et al. Hierarchical Policy Search via Return-Weighted Density Estimation , 2017, AAAI.
[19] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[21] Gerhard Neumann,et al. Variational Inference for Policy Search in changing situations , 2011, ICML.
[22] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[23] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[24] New York Dover,et al. ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .
[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[26] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[27] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[28] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[29] Yuval Tassa,et al. Relative Entropy Regularized Policy Iteration , 2018, ArXiv.
[30] Sameera S. Ponda,et al. Autonomous navigation of stratospheric balloons using reinforcement learning , 2020, Nature.
[31] S. Ana,et al. Topology , 2018, International Journal of Mathematics Trends and Technology.
[32] R. Taylor. A User's Guide to Measure-Theoretic Probability , 2003 .
[33] Tom Schaul,et al. Fitness Expectation Maximization , 2008, PPSN.
[34] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[35] Jan Peters,et al. Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.
[36] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[37] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.