Go and no-go learning in reward and punishment: Interactions between affect and effect

Decision-making invokes two fundamental axes of control: affect or valence, spanning reward and punishment, and effect or action, spanning invigoration and inhibition. We studied the acquisition of instrumental responding in healthy human volunteers in a task in which we orthogonalized action requirements and outcome valence. Subjects were much more successful in learning active choices in rewarded conditions, and passive choices in punished conditions. Using computational reinforcement-learning models, we teased apart contributions from putatively instrumental and Pavlovian components in the generation of the observed asymmetry during learning. Moreover, using model-based fMRI, we showed that BOLD signals in striatum and substantia nigra/ventral tegmental area (SN/VTA) correlated with instrumentally learnt action values, but with opposite signs for go and no-go choices. Finally, we showed that successful instrumental learning depends on engagement of bilateral inferior frontal gyrus. Our behavioral and computational data showed that instrumental learning is contingent on overcoming inherent and plastic Pavlovian biases, while our neuronal data showed this learning is linked to unique patterns of brain activity in regions implicated in action and inhibition respectively.

[1]  C. Gerfen The neostriatal mosaic: multiple levels of compartmental organization , 1992, Trends in Neurosciences.

[2]  P. Dayan,et al.  Serotonin in affective control. , 2009, Annual review of neuroscience.

[3]  B. Balleine,et al.  The integrative function of the basal ganglia in instrumental conditioning , 2009, Behavioural Brain Research.

[4]  P. Glimcher,et al.  Statistics of midbrain dopamine neuron spike trains in the awake primate. , 2007, Journal of neurophysiology.

[5]  D. J. White,et al.  Decision Theory , 2018, Behavioral Finance for Private Banking.

[6]  J. Wickens,et al.  Striatal contributions to reward and decision making: making sense of regional variations in a reiterated processing matrix. , 2007, Annals of the New York Academy of Sciences.

[7]  Jean-Luc Anton,et al.  Region of interest analysis using an SPM toolbox , 2010 .

[8]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[9]  Richard J. Beninger,et al.  The effects of pimozide during pairing on the transfer of classical conditioning to an operant discrimination , 1981, Pharmacology Biochemistry and Behavior.

[10]  Timothy E. J. Behrens,et al.  Dissociable Reward and Timing Signals in Human Midbrain and Ventral Striatum , 2011, Neuron.

[11]  Mark W Woolrich,et al.  Associative learning of social value , 2008, Nature.

[12]  P. Dayan,et al.  Opponency Revisited: Competition and Cooperation Between Dopamine and Serotonin , 2010, Neuropsychopharmacology.

[13]  J. Wickens,et al.  Striatal Contributions to Reward and Decision Making , 2007 .

[14]  T. Robbins,et al.  Neural mechanisms underlying the vulnerability to develop compulsive drug-seeking habits and addiction , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[15]  Isaac Meilijson,et al.  Evolution of Reinforcement Learning in Uncertain Environments: A Simple Explanation for Complex Foraging Behaviors , 2002, Adapt. Behav..

[16]  D. Blanchard,et al.  Ethoexperimental approaches to the biology of emotion. , 1988, Annual review of psychology.

[17]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[18]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[19]  Robert E. Marks,et al.  Learning to be risk averse? , 2014, 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).

[20]  R. Palmiter,et al.  Palmiter for learning and maintenance of a conditioned avoidance response Requirement of dopamine signaling in the amygdala and striatum Material , 2011 .

[21]  Robert Turner,et al.  Image Distortion Correction in fMRI: A Quantitative Evaluation , 2002, NeuroImage.

[22]  J. Salamone,et al.  A neurochemical and behavioral investigation of the involvement of nucleus accumbens dopamine in instrumental avoidance , 1993, Neuroscience.

[23]  K. Berridge,et al.  What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? , 1998, Brain Research Reviews.

[24]  T. Robbins Shifting and stopping: fronto-striatal substrates, neurochemical modulation and clinical implications , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[25]  J. Salamone,et al.  Effort-related functions of nucleus accumbens dopamine and associated forebrain circuits , 2007, Psychopharmacology.

[26]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[27]  K. Breland,et al.  The misbehavior of organisms. , 1961 .

[28]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[29]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  T. Robbins,et al.  Nucleus accumbens dopamine depletion impairs both acquisition and performance of appetitive Pavlovian approach behaviour: implications for mesoaccumbens dopamine function , 2002, Behavioural Brain Research.

[32]  T. Robinson,et al.  A selective role for dopamine in reward learning , 2010, Nature.

[33]  J. Deakin,et al.  5-HT and mechanisms of defence , 1991, Journal of psychopharmacology.

[34]  P. Dayan,et al.  How Humans Integrate the Prospects of Pain and Reward during Choice , 2009, The Journal of Neuroscience.

[35]  Matthew D. Lieberman,et al.  Serotonin Modulates Behavioral Reactions to Unfairness , 2008, Science.

[36]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[37]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[38]  K. Berridge Faculty Opinions recommendation of Review. Neural mechanisms underlying the vulnerability to develop compulsive drug-seeking habits and addiction. , 2008 .

[39]  Stephen M Fleming,et al.  Overcoming status quo bias in the human brain , 2010, Proceedings of the National Academy of Sciences.

[40]  Arno Klein,et al.  Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration , 2009, NeuroImage.

[41]  Nikolaus Weiskopf,et al.  Optimal EPI parameters for reduction of susceptibility-induced BOLD sensitivity losses: A whole-brain analysis at 3 T and 1.5 T , 2006, NeuroImage.

[42]  P. Tobler,et al.  Functional imaging of the human dopaminergic midbrain , 2009, Trends in Neurosciences.

[43]  R. Turner,et al.  Event-Related fMRI: Characterizing Differential Responses , 1998, NeuroImage.

[44]  O. Mowrer On the dual nature of learning—a re-interpretation of "conditioning" and "problem-solving." , 1947 .

[45]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[46]  D. R. Williams,et al.  Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement. , 1969, Journal of the experimental analysis of behavior.

[47]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[48]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[49]  Raymond J. Dolan,et al.  Disentangling the Roles of Approach, Activation and Valence in Instrumental and Pavlovian Responding , 2011, PLoS Comput. Biol..

[50]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[51]  N. Bunzeck,et al.  Absolute Coding of Stimulus Novelty in the Human Substantia Nigra/VTA , 2006, Neuron.

[52]  R. O’Reilly,et al.  Separate neural substrates for skill learning and performance in the ventral and dorsal striatum , 2007, Nature Neuroscience.

[53]  T. Robinson,et al.  An Animal Model of Genetic Vulnerability to Behavioral Disinhibition and Responsiveness to Reward-Related Cues: Implications for Addiction , 2010, Neuropsychopharmacology.

[54]  Raymond J. Dolan,et al.  Conditioned associations and economic decision biases , 2010, NeuroImage.

[55]  R. Poldrack,et al.  Cortical and Subcortical Contributions to Stop Signal Response Inhibition: Role of the Subthalamic Nucleus , 2006, The Journal of Neuroscience.

[56]  J. Gray,et al.  Précis of The neuropsychology of anxiety: An enquiry into the functions of the septo-hippocampal system , 1982, Behavioral and Brain Sciences.

[57]  P. Dayan,et al.  Human Pavlovian–Instrumental Transfer , 2008, The Journal of Neuroscience.

[58]  N. Daw,et al.  Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making , 2007, The Journal of Neuroscience.

[59]  T. Robbins,et al.  Reconciling the Role of Serotonin in Behavioral Inhibition and Aversion: Acute Tryptophan Depletion Abolishes Punishment-Induced Inhibition in Humans , 2009, The Journal of Neuroscience.

[60]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[61]  Terry E Robinson,et al.  The Influence of Subthalamic Nucleus Lesions on Sign-Tracking to Stimuli Paired with Food and Drug Rewards: Facilitation of Incentive Salience Attribution? , 2008, Neuropsychopharmacology.

[62]  T. Maia Two-factor theory, the actor-critic model, and conditioned avoidance , 2010, Learning & behavior.

[63]  Peter Dayan,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[64]  M. Frank,et al.  Neurogenetics and Pharmacology of Learning, Motivation, and Cognition , 2011, Neuropsychopharmacology.

[65]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[66]  P. Soubrié Reconciling the role of central serotonin neurons in human and animal behavior , 1986, Behavioral and Brain Sciences.

[67]  N. Daw,et al.  Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction , 2011, The Journal of Neuroscience.

[68]  P. Dayan,et al.  Behavioral/systems/cognitive Action Dominates Valence in Anticipatory Representations in the Human Striatum and Dopaminergic Midbrain , 2010 .

[69]  S. Nakanishi,et al.  Distinct Roles of Synaptic Transmission in Direct and Indirect Striatal Pathways to Reward and Aversive Behavior , 2010, Neuron.

[70]  N. Daw,et al.  Serotonin and Dopamine: Unifying Affective, Activational, and Decision Functions , 2011, Neuropsychopharmacology.

[71]  Jerker Denrell Adaptive learning and risk taking. , 2007, Psychological review.

[72]  M. Frank,et al.  Striatal Dopamine Predicts Outcome-Specific Reversal Learning and Its Sensitivity to Dopaminergic Drug Administration , 2009, The Journal of Neuroscience.

[73]  G Helms,et al.  Multi-parameter mapping of the human brain at 1mm resolution in less than 20 minutes , 2008 .

[74]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[75]  T. Robbins,et al.  Dissociation in Effects of Lesions of the Nucleus Accumbens Core and Shell on Appetitive Pavlovian Approach Behavior and the Potentiation of Conditioned Reinforcement and Locomotor Activity byd-Amphetamine , 1999, The Journal of Neuroscience.

[76]  P. Dayan,et al.  A temporal difference account of avoidance learning , 2008, Network.