Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models
暂无分享,去创建一个
Martin A. Riedmiller | Thomas Lampe | Abbas Abdolmaleki | Jost Tobias Springenberg | Nicolas Heess | Martin Riedmiller | Roland Hafner | Michael Neunert | Noah Siegel | Arunkumar Byravan | N. Heess | Roland Hafner | T. Lampe | A. Abdolmaleki | Michael Neunert | Noah Siegel | Arunkumar Byravan | J. T. Springenberg
[1] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[3] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[4] Sergey Levine,et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.
[5] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[6] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.
[7] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[8] Alireza Fathi,et al. The Devil is in the Decoder: Classification, Regression and GANs , 2017, International Journal of Computer Vision.
[9] Yee Whye Teh,et al. Exploiting Hierarchy for Learning and Transfer in KL-regularized RL , 2019, ArXiv.
[10] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[11] Sergey Levine,et al. SOLAR: Deep Structured Latent Representations for Model-Based Reinforcement Learning , 2018, ArXiv.
[12] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[15] Sergio Guadarrama,et al. The Devil is in the Decoder , 2017, BMVC.
[16] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.
[17] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[18] Henry Zhu,et al. Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[19] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[20] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.
[21] Dirk P. Kroese,et al. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .
[22] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[23] Sergey Levine,et al. Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.
[24] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[25] Alexander Lerchner,et al. COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration , 2019, ArXiv.
[26] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[27] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[28] Lih-Yuan Deng,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.
[29] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[30] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[31] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[32] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[33] Joel Z. Leibo,et al. Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.
[34] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[35] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[36] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[37] Tamim Asfour,et al. Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.
[38] Yee Whye Teh,et al. Information asymmetry in KL-regularized RL , 2019, ICLR.
[39] Sergey Levine,et al. Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.
[40] Yuval Tassa,et al. Relative Entropy Regularized Policy Iteration , 2018, ArXiv.
[41] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[42] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[43] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[44] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[45] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[46] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[47] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[48] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[49] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[50] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[51] Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.
[52] Sergey Levine,et al. Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.
[53] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[54] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[55] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[56] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.