Tamping Ramping: Algorithmic, Implementational, and Computational Explanations of Phasic Dopamine Signals in the Accumbens

Substantial evidence suggests that the phasic activity of dopamine neurons represents reinforcement learning’s temporal difference prediction error. However, recent reports of ramp-like increases in dopamine concentration in the striatum when animals are about to act, or are about to reach rewards, appear to pose a challenge to established thinking. This is because the implied activity is persistently predictable by preceding stimuli, and so cannot arise as this sort of prediction error. Here, we explore three possible accounts of such ramping signals: (a) the resolution of uncertainty about the timing of action; (b) the direct influence of dopamine over mechanisms associated with making choices; and (c) a new model of discounted vigour. Collectively, these suggest that dopamine ramps may be explained, with only minor disturbance, by standard theoretical ideas, though urgent questions remain regarding their proximal cause. We suggest experimental approaches to disentangling which of the proposed mechanisms are responsible for dopamine ramps.

[1]  G. A. Barnard,et al.  Sequential Tests in Industrial Statistics , 1946 .

[2]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[3]  M. Stone Models for choice-reaction time , 1960 .

[4]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[5]  Donald Laming,et al.  Information theory of choice-reaction times , 1968 .

[6]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[7]  J. O'Keefe,et al.  The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. , 1971, Brain research.

[8]  G. Ainslie,et al.  Impulse control in pigeons. , 1974, Journal of the experimental analysis of behavior.

[9]  J. Gibbon Scalar expectancy theory and Weber's law in animal timing. , 1977 .

[10]  Roger Ratcliff,et al.  A Theory of Memory Retrieval. , 1978 .

[11]  R. Passingham The hippocampus as a cognitive map J. O'Keefe & L. Nadel, Oxford University Press, Oxford (1978). 570 pp., £25.00 , 1979, Neuroscience.

[12]  B. Libet,et al.  Readiness-potentials preceding unrestricted 'spontaneous' vs. pre-planned voluntary acts. , 1982, Electroencephalography and clinical neurophysiology.

[13]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  B. Libet,et al.  Time of conscious intention to act in relation to onset of cerebral activity (readiness-potential). The unconscious initiation of a freely voluntary act. , 1983 .

[15]  B. Libet,et al.  Time of conscious intention to act in relation to onset of cerebral activity (readiness-potential). The unconscious initiation of a freely voluntary act. , 1983, Brain : a journal of neurology.

[16]  G. E. Alexander,et al.  Parallel organization of functionally segregated circuits linking basal ganglia and cortex. , 1986, Annual review of neuroscience.

[17]  W. Schultz,et al.  Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. , 1990, Journal of neurophysiology.

[18]  J D Cohen,et al.  A network model of catecholamine effects: gain, signal-to-noise ratio, and behavior. , 1990, Science.

[19]  A. Grace Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis for the etiology of schizophrenia , 1991, Neuroscience.

[20]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[21]  D. S. Zahm,et al.  On the significance of subterritories in the “accumbens” part of the rat ventral striatum , 1992, Neuroscience.

[22]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[23]  M. Deschenes,et al.  Corticothalamic projections from layer V cells in rat are collaterals of long-range corticofugal axons , 1994, Brain Research.

[24]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[25]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[26]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[27]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[28]  Peter Dayan,et al.  Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.

[29]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[30]  M. Deschenes,et al.  Corticostriatal projections from layer V cells in rat are collaterals of long-range corticofugal axons , 1996, Brain Research.

[31]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[32]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[33]  D. S. Zahm,et al.  Functional‐anatomical Implications of the Nucleus Accumbens Core and Shell Subterritories , 1999, Annals of the New York Academy of Sciences.

[34]  T. Robbins,et al.  Dissociation in Effects of Lesions of the Nucleus Accumbens Core and Shell on Appetitive Pavlovian Approach Behavior and the Potentiation of Conditioned Reinforcement and Locomotor Activity byd-Amphetamine , 1999, The Journal of Neuroscience.

[35]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[36]  David J. Foster,et al.  A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[37]  R. Malenka,et al.  Dopaminergic modulation of neuronal excitability in the striatum and nucleus accumbens. , 2000, Annual review of neuroscience.

[38]  James L. McClelland,et al.  The time course of perceptual choice: the leaky, competing accumulator model. , 2001, Psychological review.

[39]  G. Ainslie Breakdown of will , 2001 .

[40]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[41]  B. Knowlton,et al.  Learning and memory functions of the Basal Ganglia. , 2002, Annual review of neuroscience.

[42]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[43]  G. Di Chiara Nucleus accumbens shell and core dopamine: differential role in behavior and addiction. , 2002, Behavioural brain research.

[44]  Xiao-Jing Wang,et al.  Probabilistic Decision Making by Slow Reverberation in Cortical Circuits , 2002, Neuron.

[45]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[46]  M. El-Sabaawi Breakdown of Will , 2002 .

[47]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[48]  Samuel M. McClure,et al.  A computational substrate for incentive salience , 2003, Trends in Neurosciences.

[49]  R. Wightman,et al.  Subsecond dopamine release promotes cocaine seeking , 2003, Nature.

[50]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[51]  S. Haber The primate basal ganglia: parallel and integrative networks , 2003, Journal of Chemical Neuroanatomy.

[52]  A. Grace,et al.  Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission , 2003, Nature Neuroscience.

[53]  Tatsuo K Sato,et al.  Correlated Coding of Motivation and Outcome of Decision by Dopamine Neurons , 2003, The Journal of Neuroscience.

[54]  A. Reiner,et al.  Evidence for Differential Cortical Input to Direct Pathway versus Indirect Pathway Striatal Projection Neurons in Rats , 2004, The Journal of Neuroscience.

[55]  P. Montague,et al.  Dynamic Gain Control of Dopamine Delivery in Freely Moving Animals , 2004, The Journal of Neuroscience.

[56]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[57]  Jonathan D. Cohen,et al.  Computational roles for dopamine in behavioural control , 2004, Nature.

[58]  W. Schultz,et al.  Role of primate basal ganglia and frontal cortex in the internal generation of movements , 1992, Experimental Brain Research.

[59]  Jonathan D. Cohen,et al.  Phasic Activation of Monkey Locus Ceruleus Neurons by Simple Decisions in a Forced-Choice Task , 2004, The Journal of Neuroscience.

[60]  Margo Wilson,et al.  Do pretty women inspire men to discount the future? , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[61]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[62]  R. Wightman,et al.  Dopamine Operates as a Subsecond Modulator of Food Seeking , 2004, The Journal of Neuroscience.

[63]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[64]  H. Kornhuber,et al.  Hirnpotentialänderungen bei Willkürbewegungen und passiven Bewegungen des Menschen: Bereitschaftspotential und reafferente Potentiale , 1965, Pflüger's Archiv für die gesamte Physiologie des Menschen und der Tiere.

[65]  S. T. Kitai,et al.  Morphological and electrophysiological characteristics of pyramidal tract neurons in the rat , 2004, Experimental Brain Research.

[66]  W. Schultz,et al.  Neuronal activity in the monkey striatum during the initiation of movements , 2004, Experimental Brain Research.

[67]  P. Holmes,et al.  Optimizing reward rate in two alternative choice tasks : Mathematical formalism , 2004 .

[68]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[69]  R. Wightman,et al.  Extinction of Cocaine Self-Administration Reveals Functionally and Temporally Distinct Dopaminergic Signals in the Nucleus Accumbens , 2005, Neuron.

[70]  Catalin V. Buhusi,et al.  What makes us tick? Functional and neural mechanisms of interval timing , 2005, Nature Reviews Neuroscience.

[71]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[72]  Michael J. Frank,et al.  Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Nonmedicated Parkinsonism , 2005, Journal of Cognitive Neuroscience.

[73]  P. Dayan,et al.  Dopamine, learning, and impulsivity: a biological account of attention-deficit/hyperactivity disorder. , 2005, Journal of child and adolescent psychopharmacology.

[74]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[75]  E.N. Brown,et al.  An analysis of hippocampal spatio-temporal representations using a Bayesian algorithm for neural spike train decoding , 2005, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[76]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[77]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[78]  P. Garris,et al.  Simultaneous dopamine and single-unit recordings reveal accumbens GABAergic responses: implications for intracranial self-stimulation. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[79]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[80]  W. Schultz,et al.  Evidence that the delay-period activity of dopamine neurons corresponds to reward uncertainty rather than backpropagating TD errors , 2005, Behavioral and Brain Functions.

[81]  R. Wightman,et al.  Rapid Dopamine Signaling in the Nucleus Accumbens during Contingent and Noncontingent Cocaine Administration , 2005, Neuropsychopharmacology.

[82]  E. Bézard Recent breakthroughs in basal ganglia research , 2006 .

[83]  Kenji Doya,et al.  Humans Can Adopt Optimal Discounting Strategy under Real-Time Constraints , 2006, PLoS Comput. Biol..

[84]  P. Dayan,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.8 Full text provided by www.sciencedirect.com A normative perspective on motivation , 2022 .

[85]  Jonathan D. Cohen,et al.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. , 2006, Psychological review.

[86]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[87]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[88]  R. Wightman,et al.  Dopamine release is heterogeneous within microenvironments of the rat nucleus accumbens , 2007, The European journal of neuroscience.

[89]  P. Glimcher,et al.  Statistics of midbrain dopamine neuron spike trains in the awake primate. , 2007, Journal of neurophysiology.

[90]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[91]  S. Ikemoto Dopamine reward circuitry: Two projection systems from the ventral midbrain to the nucleus accumbens–olfactory tubercle complex , 2007, Brain Research Reviews.

[92]  R. Wightman,et al.  Coordinated Accumbal Dopamine Release and Neural Activity Drive Goal-Directed Behavior , 2007, Neuron.

[93]  R. Wightman,et al.  Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens , 2007, Nature Neuroscience.

[94]  Saori C. Tanaka,et al.  Serotonin Differentially Regulates Short- and Long-Term Prediction of Rewards in the Ventral and Dorsal Striatum , 2007, PloS one.

[95]  A. Grace,et al.  Regulation of firing of dopaminergic neurons and control of goal-directed behaviors , 2007, Trends in Neurosciences.

[96]  W. Newsome,et al.  The temporal precision of reward prediction in dopamine neurons , 2008, Nature Neuroscience.

[97]  T. Lehtimäki,et al.  Behavioral and Brain Functions , 2008 .

[98]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[99]  Eric Shea-Brown,et al.  Optimization of Decision Making in Multilayer Networks: The Role of Locus Coeruleus , 2008, Neural Computation.

[100]  Emilio Kropff,et al.  Place cells, grid cells, and the brain's spatial representation system. , 2008, Annual review of neuroscience.

[101]  M. Khamassi,et al.  Anticipatory reward signals in ventral striatal neurons of behaving rats , 2008, The European journal of neuroscience.

[102]  W. Schultz,et al.  Influence of Reward Delays on Responses of Dopamine Neurons , 2008, The Journal of Neuroscience.

[103]  Yoshua Bengio,et al.  Alternative time representation in dopamine models , 2009, Journal of Computational Neuroscience.

[104]  G. Stuber,et al.  Neural encoding of cocaine‐seeking behavior is coincident with phasic dopamine release in the accumbens core and shell , 2009, The European journal of neuroscience.

[105]  R. Wightman,et al.  Synaptic Overflow of Dopamine in the Nucleus Accumbens Arises from Neuronal Activity in the Ventral Tegmental Area , 2009, The Journal of Neuroscience.

[106]  Hiroyuki Nakahara,et al.  Internal-Time Temporal Difference Model for Neural Value-Based Decision Making , 2010, Neural Computation.

[107]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .

[108]  Richard L. Lewis,et al.  Internal Rewards Mitigate Agent Boundedness , 2010, ICML.

[109]  S. Nicola The Flexible Approach Hypothesis: Unification of Effort and Cue-Responding Hypotheses for the Role of Nucleus Accumbens Dopamine in the Activation of Reward-Seeking Behavior , 2010, The Journal of Neuroscience.

[110]  Ethan S. Bromberg-Martin,et al.  Distinct Tonic and Phasic Anticipatory Activity in Lateral Habenula and Dopamine Neurons , 2010, Neuron.

[111]  T. Prescott,et al.  The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. , 2010, Progress in Neurobiology.

[112]  P. Fletcher,et al.  Faculty Opinions recommendation of A selective role for dopamine in stimulus-reward learning. , 2011 .

[113]  R. Wightman,et al.  Rapid Dopamine Signaling Differentially Modulates Distinct Microcircuits within the Nucleus Accumbens during Sucrose-Directed Behavior , 2011, The Journal of Neuroscience.

[114]  Peter Dayan,et al.  Vigor in the Face of Fluctuating Rates of Reward: An Experimental Examination , 2011, Journal of Cognitive Neuroscience.

[115]  Matthijs A. A. van der Meer,et al.  Theta Phase Precession in Rat Ventral Striatum Links Place and Reward Information , 2011, The Journal of Neuroscience.

[116]  A. Rangel,et al.  Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions , 2011, Proceedings of the National Academy of Sciences.

[117]  S. Cragg,et al.  Dopamine release in the basal ganglia , 2011, Neuroscience.

[118]  B. Balleine,et al.  The General and Outcome-Specific Forms of Pavlovian-Instrumental Transfer Are Differentially Mediated by the Nucleus Accumbens Core and Shell , 2011, The Journal of Neuroscience.

[119]  Roger Ratcliff,et al.  Reinforcement-Based Decision Making in Corticostriatal Circuits: Mutual Constraints by Neurocomputational and Diffusion Models , 2012, Neural Computation.

[120]  K. Deisseroth,et al.  Striatal Dopamine Release Is Triggered by Synchronized Activity in Cholinergic Interneurons , 2012, Neuron.

[121]  P. Dayan Instrumental vigour in punishment and reward , 2012, The European journal of neuroscience.

[122]  S. Ostlund,et al.  Phasic Mesolimbic Dopamine Signaling Precedes and Predicts Performance of a Self-Initiated Action Sequence Task , 2012, Biological Psychiatry.

[123]  J. Salamone,et al.  The Mysterious Motivational Functions of Mesolimbic Dopamine , 2012, Neuron.

[124]  Anatol C. Kreitzer,et al.  Distinct roles for direct and indirect pathway striatal neurons in reinforcement , 2012, Nature Neuroscience.

[125]  P. Phillips,et al.  Dopamine Encoding of Pavlovian Incentive Stimuli Diminishes with Extended Training , 2013, The Journal of Neuroscience.

[126]  Y. Niv Neuroscience: Dopamine ramps up , 2013, Nature.

[127]  Ulrik R Beierholm,et al.  Dopamine Modulates Reward-Related Vigor , 2013, Neuropsychopharmacology.

[128]  A. Graybiel,et al.  Prolonged Dopamine Signalling in Striatum Signals Proximity and Value of Distant Rewards , 2013, Nature.

[129]  Michael P. Saddoris,et al.  Rapid dopamine dynamics in the accumbens core and shell: learning and action. , 2013, Frontiers in bioscience.

[130]  Roberto Araya,et al.  Two-photon optical interrogation of individual dendritic spines with caged dopamine. , 2013, ACS chemical neuroscience.

[131]  G. Fisone,et al.  Acquisition and expression of conditioned taste aversion differentially affects extracellular signal regulated kinase and glutamate receptor phosphorylation in rat prefrontal cortex and nucleus accumbens , 2014, Front. Behav. Neurosci..

[132]  P. Glimcher,et al.  Phasic Dopamine Release in the Rat Nucleus Accumbens Symmetrically Encodes a Reward Prediction Error Term , 2014, The Journal of Neuroscience.

[133]  Peter Dayan,et al.  Optimal indolence: a normative microscopic approach to work and leisure , 2014, Journal of The Royal Society Interface.

[134]  Samuel Gershman,et al.  Dopamine Ramps Are a Consequence of Reward Prediction Errors , 2014, Neural Computation.

[135]  Samuel Gershman,et al.  Time representation in reinforcement learning models of the basal ganglia , 2014, Front. Comput. Neurosci..

[136]  James B. Aimone,et al.  N2A: a computational tool for modeling from neurons to algorithms , 2014, Front. Neural Circuits.

[137]  Anne G E Collins,et al.  Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. , 2014, Psychological review.

[138]  Kenji Morita,et al.  Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits , 2014, Front. Neural Circuits.

[139]  J. Cheer,et al.  Local control of striatal dopamine release , 2014, Front. Behav. Neurosci..

[140]  R. Turner,et al.  Dopamine neurons encode errors in predicting movement trigger occurrence. , 2015, Journal of neurophysiology.

[141]  P. Phillips,et al.  Dynamic shaping of dopamine signals during probabilistic Pavlovian conditioning , 2015, Neurobiology of Learning and Memory.

[142]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.