暂无分享,去创建一个
[1] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[3] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[4] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[5] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[6] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.
[7] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[8] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[9] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[10] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[11] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[12] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[14] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[15] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[16] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[17] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[18] Sham M. Kakade,et al. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.
[19] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[21] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[22] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[23] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[24] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[25] Yuval Tassa,et al. Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[26] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.
[27] David R. Anderson,et al. Model selection and multimodel inference : a practical information-theoretic approach , 2003 .
[28] Manfred Morari,et al. Model predictive control: Theory and practice - A survey , 1989, Autom..
[29] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[30] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[31] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[32] Arthur G. Richards,et al. Robust constrained model predictive control , 2005 .
[33] Sergey Levine,et al. Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[34] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.
[35] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[36] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[37] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..
[38] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[39] Marc Peter Deisenroth,et al. Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.
[40] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[41] Brian Everitt,et al. Miscellaneous Clustering Methods , 2011 .
[42] R. Rubinstein. The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .
[43] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[44] Tamim Asfour,et al. Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.
[45] Nolan Wagener,et al. Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).