Model-Free Episodic Control

State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.

[1]  D. W. In memory of ... , 1963, Science.

[2]  D Marr,et al.  Simple memory: a theory for archicortex. , 1971, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[3]  R. Morris,et al.  Place navigation impaired in rats with hippocampal lesions , 1982, Nature.

[4]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R. Sutherland,et al.  A behavioural analysis of spatial localization following electrolytic, kainate- or colchicine-induced damage to the hippocampal formation in the rat , 1983, Behavioural Brain Research.

[6]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[7]  B. McNaughton,et al.  Hippocampal synaptic enhancement and information storage within a distributed memory system , 1987, Trends in Neurosciences.

[8]  R. Sutherland,et al.  Configural association theory: The role of the hippocampal formation in learning, memory, and amnesia , 1989, Psychobiology.

[9]  E. Tulving,et al.  Long-lasting perceptual priming and semantic learning in amnesia: a case experiment. , 1991, Journal of experimental psychology. Learning, memory, and cognition.

[10]  L. Squire Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. , 1992, Psychological review.

[11]  D. Amaral,et al.  Perirhinal and parahippocampal cortices of the macaque monkey: Cortical afferents , 1994, The Journal of comparative neurology.

[12]  E. Rolls,et al.  Computational analysis of the role of the hippocampus in memory , 1994, Hippocampus.

[13]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[14]  James L. McClelland,et al.  Considerations arising from a complementary learning systems perspective on hippocampus and neocortex , 1996, Hippocampus.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  A. Dickinson,et al.  Episodic-like memory during cache recovery by scrub jays , 1998, Nature.

[17]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[18]  Malcolm W. Brown,et al.  Recognition memory: What are the roles of the perirhinal cortex and hippocampus? , 2001, Nature Reviews Neuroscience.

[19]  M. Quirk,et al.  Requirement for Hippocampal CA3 NMDA Receptors in Associative Memory Recall , 2002, Science.

[20]  R. O’Reilly,et al.  Modeling hippocampal and neocortical contributions to recognition memory: a complementary-learning-systems approach. , 2003, Psychological review.

[21]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[22]  L. Squire Memory systems of the brain: A brief history and current perspective , 2004, Neurobiology of Learning and Memory.

[23]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  T. Bliss,et al.  The Hippocampus Book , 2006 .

[26]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[27]  Peter Dayan,et al.  Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[28]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[29]  L. Nadel,et al.  Decay happens: the role of active forgetting in memory , 2013, Trends in Cognitive Sciences.

[30]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .

[31]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[32]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[33]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[34]  Joel Z. Leibo,et al.  Approximate Hubel-Wiesel Modules and the Data Structures of Neural Computation , 2015, ArXiv.

[35]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[37]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[38]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[39]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[40]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[41]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[42]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.