Projections for Approximate Policy Iteration Algorithms
暂无分享,去创建一个
Jan Peters | Gerhard Neumann | Joni Pajarinen | Riad Akrour | Jan Peters | G. Neumann | J. Pajarinen | R. Akrour
[1] Jian Zhang,et al. Structured Control Nets for Deep Reinforcement Learning , 2018, ICML.
[2] Yuval Tassa,et al. Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[3] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.
[4] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..
[5] Luís Paulo Reis,et al. Model-Based Relative Entropy Stochastic Search , 2016, NIPS.
[6] Masashi Sugiyama,et al. Guide Actor-Critic for Continuous Control , 2017, ICLR.
[7] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[8] Yuval Tassa,et al. Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[9] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[10] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[11] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[12] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[13] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[14] Jan Peters,et al. Local Bayesian Optimization of Motor Skills , 2017, ICML.
[15] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[16] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[17] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[18] Shie Mannor,et al. Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.
[19] Jan Peters,et al. Reinforcement learning vs human programming in tetherball robot games , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[20] Christian Igel,et al. Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.
[21] Martha White,et al. Two-Timescale Networks for Nonlinear Value Function Approximation , 2019, ICLR.
[22] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[23] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[24] Philip S. Thomas,et al. A Notation for Markov Decision Processes , 2015, ArXiv.
[25] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[27] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[28] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.
[29] Mingjun Zhong,et al. Efficient Gradient-Free Variational Inference using Policy Search , 2018, ICML.
[30] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[31] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[32] Jan Peters,et al. Model-Free Trajectory-based Policy Optimization with Monotonic Improvement , 2016, J. Mach. Learn. Res..