Imagination-Augmented Agents for Deep Reinforcement Learning

We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines.

[1]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[2]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[3]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[4]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[5]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[6]  Hitoshi Matsubara,et al.  Automatic Making of Sokoban Problems , 1996, PRICAI.

[7]  B. Balleine,et al.  The Role of Learning in the Operation of Motivational Systems , 2002 .

[8]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[9]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[10]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[11]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[12]  D. Hassabis,et al.  Patients with hippocampal amnesia cannot imagine new experiences , 2007, Proceedings of the National Academy of Sciences.

[13]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[14]  D. Hassabis,et al.  Using Imagination to Understand the Neural Basis of Episodic Memory , 2007, The Journal of Neuroscience.

[15]  Levente Kocsis,et al.  Transpositions and move groups in Monte Carlo tree search , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[16]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[17]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[18]  Christopher D. Rosin,et al.  Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.

[19]  Joshua Taylor,et al.  Procedural Generation of Sokoban Levels , 2011 .

[20]  R. N. Spreng,et al.  The Future of Memory: Remembering, Imagining, and the Brain , 2012, Neuron.

[21]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[22]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[23]  Erik Talvitie,et al.  Model Regularization for Stable Sample Rollouts , 2014, UAI.

[24]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[25]  Jonathan P. How,et al.  Real-World Reinforcement Learning via Multifidelity Simulators , 2015, IEEE Transactions on Robotics.

[26]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[27]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[28]  Erik Talvitie,et al.  Agnostic System Identification for Monte Carlo Planning , 2015, AAAI.

[29]  Sergey Levine,et al.  Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments , 2015, ArXiv.

[30]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[31]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[32]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[33]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[34]  Jürgen Schmidhuber,et al.  On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models , 2015, ArXiv.

[35]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[36]  Alex Graves,et al.  Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.

[37]  Wojciech Zaremba,et al.  Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[38]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[39]  Martial Hebert,et al.  Improved Learning of Dynamics Models for Control , 2016, ISER.

[40]  Katja Hofmann,et al.  A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games , 2016, ICLR 2016.

[41]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[42]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[43]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[44]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[45]  Razvan Pascanu,et al.  Metacontrol for Adaptive Imagination-Based Optimization , 2017, ICLR.

[46]  Andreas Krause,et al.  Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Sergey Levine,et al.  Goal-driven dynamics learning via Bayesian optimization , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[48]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[49]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[51]  Daan Wierstra,et al.  Recurrent Environment Simulators , 2017, ICLR.

[52]  Razvan Pascanu,et al.  Learning model-based planning from scratch , 2017, ArXiv.

[53]  Yann LeCun,et al.  Model-Based Planning with Discrete and Continuous Actions , 2017 .

[54]  Satinder Singh,et al.  Value Prediction Network , 2017, NIPS.

[55]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[56]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[57]  Yann LeCun,et al.  Model-Based Planning in Discrete Action Spaces , 2017, ArXiv.

[58]  Sergey Levine,et al.  Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).