Implicit Counterfactual Effect in Partial Feedback Reinforcement Learning: Behavioral and Modeling Approach

Context by distorting values of options with respect to the distribution of available alternatives, remarkably affects learning behavior. Providing an explicit counterfactual component, outcome of unchosen option alongside with the chosen one (Complete feedback), would increase the contextual effect by inducing comparison-based strategy during learning. But It is not clear in the conditions where the context consists only of the juxtaposition of a series of options, and there is no such explicit counterfactual component (Partial feedback), whether and how the relativity will be emerged. Here for investigating whether and how implicit and explicit counterfactual components can affect reinforcement learning, we used two Partial and Complete feedback paradigms, in which options were associated with some reward distributions. Our modeling analysis illustrates that the model which uses the outcome of chosen option for updating values of both chosen and unchosen options, which is in line with diffusive function of dopamine on the striatum, can better account for the behavioral data. We also observed that size of this bias depends on the involved systems in the brain, such that this effect is larger in the transfer phase where subcortical systems are more involved, and is smaller in the deliberative value estimation phase where cortical system is more needed. Furthermore, our data shows that contextual effect is not only limited to probabilistic reward but also it extends to reward with amplitude. These results show that by extending counterfactual concept, we can better account for why there is contextual effect in a condition where there is no extra information of unchosen outcome.

[1]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[2]  Scott W. Linderman,et al.  The Striatum Organizes 3D Behavior via Moment-to-Moment Action Selection , 2018, Cell.

[3]  Alice Y. Chiang,et al.  Working-memory capacity protects model-based learning from stress , 2013, Proceedings of the National Academy of Sciences.

[4]  Carolin Dudschig,et al.  Short Article: Why do we slow down after an error? Mechanisms underlying the effects of posterror slowing , 2009, Quarterly journal of experimental psychology.

[5]  A. Sirigu,et al.  The Involvement of the Orbitofrontal Cortex in the Experience of Regret , 2004, Science.

[6]  Anatol C. Kreitzer,et al.  Distinct roles for direct and indirect pathway striatal neurons in reinforcement , 2012, Nature Neuroscience.

[7]  N. Daw,et al.  Multiple memory systems as substrates for multiple decision systems , 2015, Neurobiology of Learning and Memory.

[8]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[9]  V. Bansal,et al.  Genome-wide association study results for educational attainment aid in identifying genetic heterogeneity of schizophrenia , 2018, Nature Communications.

[10]  Okihide Hikosaka,et al.  Cortico‐basal ganglia mechanisms for overcoming innate, habitual and motivational behaviors , 2011, The European journal of neuroscience.

[11]  Rafal Bogacz,et al.  A normative account of confirmation bias during reinforcement learning , 2020, bioRxiv.

[12]  Silvia U. Maier,et al.  Acute Stress Impairs Self-Control in Goal-Directed Choice by Altering Multiple Functional Connections within the Brain’s Decision Circuits , 2015, Neuron.

[13]  D. Gudbjartsson,et al.  Variants in ELL2 influencing immunoglobulin levels associate with multiple myeloma , 2015, Nature Communications.

[14]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[15]  J. O'Doherty,et al.  Regret and its avoidance: a neuroimaging study of choice behavior , 2005, Nature Neuroscience.

[16]  R. Dolan,et al.  Brain, emotion and decision making: the paradigmatic example of regret , 2007, Trends in Cognitive Sciences.

[17]  Michael J. Frank,et al.  Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[18]  G. Schoenbaum,et al.  Transition from ‘model-based’ to ‘model-free’ behavioral control in addiction: Involvement of the orbitofrontal cortex and dorsolateral striatum , 2014, Neuropharmacology.

[19]  Roger Ratcliff,et al.  Reinforcement-Based Decision Making in Corticostriatal Circuits: Mutual Constraints by Neurocomputational and Diffusion Models , 2012, Neural Computation.

[20]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[21]  Min Whan Jung,et al.  Differential coding of reward and movement information in the dorsomedial striatal direct and indirect pathways , 2017, Nature Communications.

[22]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[23]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[24]  M. Khamassi,et al.  Contextual modulation of value signals in reward and punishment learning , 2015, Nature Communications.

[25]  Thomas E. Hazy,et al.  Towards an executive without a homunculus: computational models of the prefrontal cortex/basal ganglia system , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[26]  Masahiko Watanabe,et al.  Monitoring and Updating of Action Selection for Goal-Directed Behavior through the Striatal Direct and Indirect Pathways , 2018, Neuron.

[27]  Nick Chater,et al.  Economic irrationality is optimal during noisy decision making , 2016, Proceedings of the National Academy of Sciences.

[28]  Michael J Frank,et al.  Negative symptoms and the failure to represent the expected reward value of actions: behavioral and computational modeling evidence. , 2012, Archives of general psychiatry.

[29]  L. Wilbrecht,et al.  Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value , 2012, Nature Neuroscience.

[30]  P. Dayan,et al.  Learning Contextual Reward Expectations for Value Adaptation , 2018, Journal of Cognitive Neuroscience.

[31]  Wim Fias,et al.  Post-error slowing: An orienting account , 2009, Cognition.

[32]  Karl J. Friston,et al.  Bayesian model selection for group studies — Revisited , 2014, NeuroImage.

[33]  Y. Niv Reinforcement learning in the brain , 2009 .

[34]  Murtaza Z Mogri,et al.  Supporting Online Material Materials and Methods Som Text Figs. S1 to S8 References Cell Type–specific Loss of Bdnf Signaling Mimics Optogenetic Control of Cocaine Reward , 2022 .

[35]  C. Felser,et al.  Negative magnetoresistance without well-defined chirality in the Weyl semimetal TaP , 2015, Nature Communications.

[36]  Nathaniel D. Daw,et al.  Cognitive Control Predicts Use of Model-based Reinforcement Learning , 2014, Journal of Cognitive Neuroscience.

[37]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[38]  Jung Hoon Shin,et al.  Enhanced GABA Transmission Drives Bradykinesia Following Loss of Dopamine D2 Receptor Signaling , 2016, Neuron.

[39]  Raymond J. Dolan,et al.  A unifying Bayesian account of contextual effects in value-based choice , 2017, PLoS Comput. Biol..

[40]  Raymond J. Dolan,et al.  The influence of contextual reward statistics on risk preference , 2016, NeuroImage.

[41]  Jung Hoon Sul,et al.  Role of Striatum in Updating Values of Chosen Actions , 2009, The Journal of Neuroscience.

[42]  N. Daw,et al.  Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning , 2016, The Journal of Neuroscience.

[43]  Kae Nakamura,et al.  Basal ganglia orient eyes to reward. , 2006, Journal of neurophysiology.

[44]  Alexxai V. Kravitz,et al.  A competitive model for striatal action selection , 2019, Brain Research.

[45]  L. Paninski,et al.  The Spatiotemporal Organization of the Striatum Encodes Action Space , 2017, Neuron.

[46]  P. Greengard,et al.  Dichotomous Dopaminergic Control of Striatal Synaptic Plasticity , 2008, Science.

[47]  B. Balleine,et al.  From learning to action: the integration of dorsal striatal input and output pathways in instrumental conditioning , 2018, The European journal of neuroscience.

[48]  K. Louie,et al.  The Neurobiology of Context-Dependent Valuation and Choice , 2014 .

[49]  Thomas H. B. FitzGerald,et al.  Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans , 2013, Neuron.

[50]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[51]  D. Kumaran,et al.  The Neurobiology of Reference-Dependent Value Computation , 2009, NeuroImage.

[52]  Markus Ullsperger,et al.  Learning relative values in the striatum induces violations of normative decision making , 2017, Nature Communications.

[53]  C. Summerfield,et al.  Do humans make good decisions? , 2015, Trends in Cognitive Sciences.

[54]  K. Doya,et al.  Distinct Neural Representation in the Dorsolateral, Dorsomedial, and Ventral Parts of the Striatum during Fixed- and Free-Choice Tasks , 2015, The Journal of Neuroscience.

[55]  K. Doya Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[56]  W. Schultz,et al.  Dopamine neurons learn relative chosen value from probabilistic rewards , 2016, eLife.

[57]  P. Phillips,et al.  Subsecond dopamine fluctuations in human striatum encode superposed error signals about actual and counterfactual reward , 2015, Proceedings of the National Academy of Sciences.

[58]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[59]  Daeyeol Lee,et al.  Heterogeneous Coding of Temporally Discounted Values in the Dorsal and Ventral Striatum during Intertemporal Choice , 2011, Neuron.

[60]  P. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[61]  Karl J. Friston,et al.  Neural processes mediating contextual influences on human choice behaviour , 2016, Nature Communications.

[62]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[63]  K. Doya,et al.  Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia , 2009, The Journal of Neuroscience.

[64]  A. Villringer,et al.  The interaction of acute and chronic stress impairs model-based behavioral control , 2015, Psychoneuroendocrinology.

[65]  R. Dolan,et al.  Dopamine Enhances Model-Based over Model-Free Choice Behavior , 2012, Neuron.

[66]  Bernard W Balleine,et al.  The Acquisition of Goal-Directed Actions Generates Opposing Plasticity in Direct and Indirect Pathways in Dorsomedial Striatum , 2014, The Journal of Neuroscience.

[67]  Eric A. Yttri,et al.  Opponent and bidirectional control of movement velocity in the basal ganglia , 2016, Nature.

[68]  Mehdi Khamassi,et al.  Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences , 2018, Nature Communications.

[69]  Christopher H. Donahue,et al.  Distinct value encoding in striatal direct and indirect pathways during adaptive learning , 2018, bioRxiv.

[70]  Benjamin F. Grewe,et al.  Diametric neural ensemble dynamics in parkinsonian and dyskinetic states , 2018, Nature.

[71]  M. Frank,et al.  Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. , 2009, Nature neuroscience.

[72]  Anne G E Collins,et al.  Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. , 2014, Psychological review.

[73]  Michael J. Frank,et al.  Interactions between frontal cortex and basal ganglia in working memory: A computational model , 2001, Cognitive, affective & behavioral neuroscience.

[74]  Martin Weber,et al.  Reference-Point Formation and Updating , 2011, Manag. Sci..

[75]  Ilana B. Witten,et al.  Striatal circuits for reward learning and decision-making , 2019, Nature Reviews Neuroscience.

[76]  Alexandra B. Nelson,et al.  Aberrant Striatal Activity in Parkinsonism and Levodopa-Induced Dyskinesia. , 2018, Cell reports.

[77]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[78]  D. Surmeier,et al.  D1 and D2 dopamine-receptor modulation of striatal glutamatergic signaling in striatal medium spiny neurons , 2007, Trends in Neurosciences.