Learning to use past evidence in a sophisticated world model

Humans and other animals are able to discover underlying statistical structure in their environments and exploit it to achieve efficient and effective performance. However, such structure is often difficult to learn and use because it is obscure, involving long-range temporal dependencies. Here, we analysed behavioural data from an extended experiment with rats, showing that the subjects learned the underlying statistical structure, albeit suffering at times from immediate inferential imperfections as to their current state within it. We accounted for their behaviour using a Hidden Markov Model, in which recent observations are integrated with evidence from the past. We found that over the course of training, subjects came to track their progress through the task more accurately, a change that our model largely attributed to improved integration of past evidence. This learning reflected the structure of the task, decreasing reliance on recent observations, which were potentially misleading.

[1]  M. Laubach,et al.  The role of rat dorsomedial prefrontal cortex in spatial working memory , 2009, Neuroscience.

[2]  R. Näätänen,et al.  Foreperiod and simple reaction time. , 1981 .

[3]  Bao-Ming Li,et al.  Neuronal representation of working memory in the medial prefrontal cortex of rats , 2014, Molecular Brain.

[4]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[5]  Eric A. Zilli,et al.  The Influence of Markov Decision Process Structure on the Possible Strategic Use of Working Memory and Episodic Memory , 2008, PloS one.

[6]  Timothy E. J. Behrens,et al.  Organizing conceptual knowledge in humans with a gridlike code , 2016, Science.

[7]  Peter Dayan,et al.  Some Work and Some Play: Microscopic and Macroscopic Approaches to Labor and Leisure , 2014, PLoS Comput. Biol..

[8]  Michael J. Frank,et al.  Interactions between frontal cortex and basal ganglia in working memory: A computational model , 2001, Cognitive, affective & behavioral neuroscience.

[9]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[10]  James L Olds,et al.  Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.

[11]  Zeb Kurth-Nelson,et al.  What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior , 2018, Neuron.

[12]  Peter Dayan,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[13]  M. Jung,et al.  Prefrontal cortex and hippocampus subserve different components of working memory in rats. , 2008, Learning & memory.

[14]  C. Hölscher,et al.  Quinolinic acid lesion of the rat entorhinal cortex pars medialis produces selective amnesia in allocentric working memory (WM), but not in egocentric WM , 1994, Behavioural Brain Research.

[15]  J. Gold,et al.  Banburismus and the Brain Decoding the Relationship between Sensory Stimuli, Decisions, and Reward , 2002, Neuron.

[16]  Gong-Wu Wang,et al.  Disconnection of the hippocampal–prefrontal cortical circuits impairs spatial working memory performance in rats , 2006, Behavioural Brain Research.

[17]  Y. Miyashita Neuronal correlate of visual associative long-term memory in the primate temporal cortex , 1988, Nature.

[18]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[19]  M. Hasselmo,et al.  Graded persistent activity in entorhinal cortex neurons , 2002, Nature.

[20]  M. Botvinick,et al.  Motivation and cognitive control: from behavior to neural mechanism. , 2015, Annual review of psychology.

[21]  P. Shizgal,et al.  Rattus Psychologicus: Construction of preferences by self-stimulating rats , 2009, Behavioural Brain Research.

[22]  Kenneth A. Norman,et al.  A probability distribution over latent causes in the orbitofrontal cortex , 2016, bioRxiv.

[23]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[24]  J. Fuster Network memory , 1997, Trends in Neurosciences.

[25]  Yannick-André Breton Molar and Molecular Models of Performance for Rewarding Brain Stimulation , 2013 .

[26]  Peter Dayan,et al.  Optimal indolence: a normative microscopic approach to work and leisure , 2014, Journal of The Royal Society Interface.

[27]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[28]  B. Richmond,et al.  Learning motivational significance of visual cues for reward schedules requires rhinal cortex , 2000, Nature Neuroscience.

[29]  Peter Shizgal,et al.  Valuation of opportunity costs by rats working for rewarding electrical brain stimulation , 2017, PloS one.

[30]  Y. Niv,et al.  Discovering latent causes in reinforcement learning , 2015, Current Opinion in Behavioral Sciences.

[31]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[32]  P. Shizgal,et al.  Psychophysical inference of frequency-following fidelity in the neural substrate for brain stimulation reward , 2015, Behavioural Brain Research.

[33]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[34]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[35]  Angela L. Duckworth,et al.  An opportunity cost model of subjective effort and task performance. , 2013, The Behavioral and brain sciences.

[36]  Jeffrey N. Rouder,et al.  Modeling Response Times for Two-Choice Decisions , 1998 .

[37]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[38]  Raymond J Dolan,et al.  A map of abstract relational knowledge in the human hippocampal–entorhinal cortex , 2017, eLife.

[39]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[40]  Nicolas W. Schuck,et al.  Human Orbitofrontal Cortex Represents a Cognitive Map of State Space , 2016, Neuron.