Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task

The recently developed ‘two-step’ behavioural task promises to differentiate model-based from model-free reinforcement learning, while generating neurophysiologically-friendly decision datasets with parametric variation of decision variables. These desirable features have prompted its widespread adoption. Here, we analyse the interactions between a range of different strategies and the structure of transitions and outcomes in order to examine constraints on what can be learned from behavioural performance. The task involves a trade-off between the need for stochasticity, to allow strategies to be discriminated, and a need for determinism, so that it is worth subjects’ investment of effort to exploit the contingencies optimally. We show through simulation that under certain conditions model-free strategies can masquerade as being model-based. We first show that seemingly innocuous modifications to the task structure can induce correlations between action values at the start of the trial and the subsequent trial events in such a way that analysis based on comparing successive trials can lead to erroneous conclusions. We confirm the power of a suggested correction to the analysis that can alleviate this problem. We then consider model-free reinforcement learning strategies that exploit correlations between where rewards are obtained and which actions have high expected value. These generate behaviour that appears model-based under these, and also more sophisticated, analyses. Exploiting the full potential of the two-step task as a tool for behavioural neuroscience requires an understanding of these issues.

[1]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[2]  Christopher D. Adams,et al.  Instrumental Responding following Reinforcer Devaluation , 1981 .

[3]  Christopher D. Adams,et al.  The Effect of the Instrumental Training Contingency on Susceptibility to Reinforcer Devaluation , 1983 .

[4]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[5]  R. Rescorla,et al.  Postconditioning devaluation of a reinforcer affects instrumental responding. , 1985 .

[6]  William T. Newsome,et al.  Cortical microstimulation influences perceptual judgements of motion direction , 1990, Nature.

[7]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[8]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[9]  Z. Mainen,et al.  Speed and accuracy of olfactory discrimination in the rat , 2003, Nature Neuroscience.

[10]  B. Balleine,et al.  The Effect of Lesions of the Basolateral Amygdala on Instrumental Conditioning , 2003, The Journal of Neuroscience.

[11]  B. Balleine,et al.  The role of prelimbic cortex in instrumental conditioning , 2003, Behavioural Brain Research.

[12]  S. Killcross,et al.  Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats , 2003, Behavioural Brain Research.

[13]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[14]  B. Balleine,et al.  Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.

[15]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[16]  B. Balleine,et al.  Lesions of Medial Prefrontal Cortex Disrupt the Acquisition But Not the Expression of Goal-Directed Learning , 2005, The Journal of Neuroscience.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  B. Balleine,et al.  The role of the dorsomedial striatum in instrumental conditioning , 2005, The European journal of neuroscience.

[19]  B. Balleine,et al.  Blockade of NMDA receptors in the dorsomedial striatum prevents action–outcome learning in instrumental conditioning , 2005, The European journal of neuroscience.

[20]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[21]  B. Balleine,et al.  Inactivation of dorsolateral striatum enhances sensitivity to changes in the action–outcome contingency in instrumental conditioning , 2006, Behavioural Brain Research.

[22]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[23]  Y. Niv,et al.  Learning latent structure: carving nature at its joints , 2010, Current Opinion in Neurobiology.

[24]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[25]  Dylan A. Simon,et al.  Neural Correlates of Forward Planning in a Spatial Decision Task in Humans , 2011, The Journal of Neuroscience.

[26]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[27]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[28]  Peter Dayan,et al.  Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees , 2012, PLoS Comput. Biol..

[29]  P. Dayan,et al.  Mapping value based planning and extensively trained choice in the human brain , 2012, Nature Neuroscience.

[30]  Xin Jin,et al.  Different dorsal striatum circuits mediate action discrimination and action generalization , 2012, The European journal of neuroscience.

[31]  R. Dolan,et al.  Dopamine Enhances Model-Based over Model-Free Choice Behavior , 2012, Neuron.

[32]  Shu-Chen Li,et al.  Of goals and habits: age-related and individual differences in goal-directed decision-making , 2013, Front. Neurosci..

[33]  Rui Costa,et al.  Premotor cortex is critical for goal-directed actions , 2013, Front. Comput. Neurosci..

[34]  A. Zador,et al.  Corticostriatal neurones in auditory cortex drive decisions during auditory discrimination , 2013, Nature.

[35]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[36]  R. Costa,et al.  Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions , 2013, Nature Communications.

[37]  Bingni W. Brunton,et al.  Rats and Humans Can Optimally Accumulate Evidence for Decision-Making , 2013, Science.

[38]  N. Daw,et al.  Extraversion differentiates between model-based and model-free strategies in a reinforcement learning task , 2013, Front. Hum. Neurosci..

[39]  Alice Y. Chiang,et al.  Working-memory capacity protects model-based learning from stress , 2013, Proceedings of the National Academy of Sciences.

[40]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .

[41]  Giovanni Pezzulo,et al.  The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation , 2013, Front. Psychol..

[42]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[43]  Thomas H. B. FitzGerald,et al.  Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans , 2013, Neuron.

[44]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[45]  Miriam Sebold,et al.  Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning , 2014, Front. Psychol..

[46]  P. Dayan,et al.  The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[47]  L. Deserno,et al.  Model-Based and Model-Free Decisions in Alcohol Dependence , 2014, Neuropsychobiology.

[48]  Thomas H. B. FitzGerald,et al.  Transcranial Direct Current Stimulation of Right Dorsolateral Prefrontal Cortex Does Not Affect Model-Based or Model-Free Reinforcement Learning in Humans , 2014, PloS one.

[49]  Zeb Kurth-Nelson,et al.  Model-Based Reasoning in Humans Becomes Automatic with Training , 2015, PLoS Comput. Biol..

[50]  P. Dayan,et al.  Disorders of compulsivity: a common bias towards learning habits , 2014, Molecular Psychiatry.

[51]  Peter Dayan,et al.  Interplay of approximate planning strategies , 2015, Proceedings of the National Academy of Sciences.

[52]  R. Dolan,et al.  Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making , 2015, Proceedings of the National Academy of Sciences.

[53]  A. Villringer,et al.  The interaction of acute and chronic stress impairs model-based behavioral control , 2015, Psychoneuroendocrinology.

[54]  N. Daw,et al.  Cognitive Control Predicts Use of Model-based Reinforcement Learning , 2014, Journal of Cognitive Neuroscience.

[55]  Vincent D Costa,et al.  Reversal Learning and Dopamine: A Bayesian Perspective , 2015, The Journal of Neuroscience.

[56]  N. Daw,et al.  Valence-dependent influence of serotonin depletion on model-based choice strategy , 2015, Molecular Psychiatry.