Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference

Dopamine is implicated in signalling model-free (MF) reward prediction errors and various aspects of model-based (MB) credit assignment and choice. Recently, we showed that cooperative interactions between MB and MF systems include guidance of MF credit assignment by MB inference. Here, we used a double-blind, placebo-controlled, within-subjects design to test the hypothesis that enhancing dopamine levels, using levodopa, boosts the guidance of MF credit assignment by MB inference. We found that levodopa enhanced retrospective guidance of MF credit assignment by MB inference, without impacting on MF and MB influences per se. This drug effect positively correlated with working memory, but only in a context where reward needed to be recalled for MF credit assignment. The dopaminergic enhancement in MB-MF interactions correlated negatively with a dopamine-dependent change in MB credit assignment, possibly reflecting a potential trade-off between these two components of behavioural control. Thus, our findings demonstrate that dopamine boosts MB inference during guidance of MF learning, supported in part by working memory, but trading-off with a dopaminergic enhancement of MB credit assignment. The findings highlight a novel role for a DA influence on MB-MF interactions.

[1]  Shakoor Pooseh,et al.  L-DOPA reduces model-free control of behavior by attenuating the transfer of value to action , 2016, NeuroImage.

[2]  R. Dolan,et al.  Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making , 2015, Proceedings of the National Academy of Sciences.

[3]  Nicolas W. Schuck,et al.  Human Orbitofrontal Cortex Represents a Cognitive Map of State Space , 2016, Neuron.

[4]  Marcelo G Mattar,et al.  Prioritized memory access explains planning and hippocampal replay , 2017, Nature Neuroscience.

[5]  P. Dayan,et al.  Human subjects exploit a cognitive map for credit assignment , 2021, Proceedings of the National Academy of Sciences.

[6]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[7]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[8]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[9]  R. Cools Chemistry of the Adaptive Mind: Lessons from Dopamine , 2019, Neuron.

[10]  R. Dolan,et al.  Dopamine Enhances Model-Based over Model-Free Choice Behavior , 2012, Neuron.

[11]  Geoffrey Schoenbaum,et al.  Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework , 2016, eLife.

[12]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[13]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[14]  P. Dayan,et al.  The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[15]  Nicolas W. Schuck,et al.  Sequential replay of nonspatial task states in the human hippocampus , 2018, Science.

[16]  Rani Moran,et al.  Old processes, new perspectives: Familiarity is correlated with (not independent of) recollection and is more (not equally) variable for targets than for lures , 2015, Cognitive Psychology.

[17]  Marcelo G. Mattar,et al.  Experience replay supports non-local learning , 2020, bioRxiv.

[18]  N. Daw,et al.  Dopamine selectively remediates 'model-based' reward learning: a computational approach. , 2016, Brain : a journal of neurology.

[19]  Daeyeol Lee,et al.  Neurochemical and Behavioral Dissections of Decision-Making in a Rodent Multistage Task , 2018, The Journal of Neuroscience.

[20]  Peter Dayan,et al.  The roles of online and offline replay in planning , 2020, bioRxiv.

[21]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[22]  P. Dayan,et al.  The roles of online and offline replay in planning , 2020, bioRxiv.

[23]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[24]  Joel L. Voss,et al.  Targeted Stimulation of Human Orbitofrontal Networks Disrupts Outcome-Guided Behavior , 2020, Current Biology.

[25]  Joshua L. Jones,et al.  Orbitofrontal Cortex Supports Behavior and Learning Using Inferred But Not Cached Values , 2012, Science.

[26]  Geoffrey Schoenbaum,et al.  Author response: Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework , 2016 .

[27]  N. Daw,et al.  Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning , 2016, The Journal of Neuroscience.

[28]  P. Dayan,et al.  Dopamine restores reward prediction errors in old age , 2013, Nature Neuroscience.

[29]  M. D’Esposito,et al.  Inverted-U–Shaped Dopamine Actions on Human Working Memory and Cognitive Control , 2011, Biological Psychiatry.

[30]  W. Schultz Midbrain Dopamine Neurons , 2009 .

[31]  Y. Niv,et al.  Model-based predictions for dopamine , 2018, Current Opinion in Neurobiology.

[32]  R. Dolan,et al.  Model based planners reflect on their model-free propensities , 2021, PLoS Comput. Biol..

[33]  Timothy E. J. Behrens,et al.  Human Replay Spontaneously Reorganizes Experience , 2019, Cell.

[34]  Peter Dayan,et al.  Retrospective model-based inference guides model-free credit assignment , 2019, Nature Communications.

[35]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[36]  Jessica I. Määttä,et al.  Dopamine promotes cognitive effort by biasing the benefits versus costs of cognitive work , 2019, Science.

[37]  S. Gershman,et al.  Dopamine reward prediction errors reflect hidden state inference across time , 2017, Nature Neuroscience.

[38]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .

[39]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[40]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[41]  M. Frank,et al.  Dopamine promotes cognitive effort by biasing the benefits versus costs of cognitive work , 2020, Science.