Monte Carlo Planning Method Estimates Planning Horizons during Interactive Social Exchange

Reciprocating interactions represent a central feature of all human exchanges. They have been the target of various recent experiments, with healthy participants and psychiatric populations engaging as dyads in multi-round exchanges such as a repeated trust task. Behaviour in such exchanges involves complexities related to each agent’s preference for equity with their partner, beliefs about the partner’s appetite for equity, beliefs about the partner’s model of their partner, and so on. Agents may also plan different numbers of steps into the future. Providing a computationally precise account of the behaviour is an essential step towards understanding what underlies choices. A natural framework for this is that of an interactive partially observable Markov decision process (IPOMDP). However, the various complexities make IPOMDPs inordinately computationally challenging. Here, we show how to approximate the solution for the multi-round trust task using a variant of the Monte-Carlo tree search algorithm. We demonstrate that the algorithm is efficient and effective, and therefore can be used to invert observations of behavioural choices. We use generated behaviour to elucidate the richness and sophistication of interactive inference.

[1]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Hilla Peretz,et al.  Ju n 20 03 Schrödinger ’ s Cat : The rules of engagement , 2003 .

[3]  J. Harsanyi Games with Incomplete Information Played by “Bayesian” Players Part II. Bayesian Equilibrium Points , 1968 .

[4]  R. McKelvey,et al.  An experimental study of the centipede game , 1992 .

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Yaw Nyarko Convergence in Economic Models with Bayesian Hierarchies of Beliefs , 1997 .

[7]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[8]  R. McKelvey,et al.  Quantal Response Equilibria for Extensive Form Games , 1998 .

[9]  E. Fehr A Theory of Fairness, Competition and Cooperation , 1998 .

[10]  Miguel A. Costa-Gomes,et al.  Cognition and Behavior in Normal-Form Games: An Experimental Study , 1998 .

[11]  E. Fehr,et al.  Fairness and Retaliation: The Economics of Reciprocity , 2000, SSRN Electronic Journal.

[12]  Daniel Houser,et al.  A functional imaging study of cooperation in two-person reciprocal exchange , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[14]  Colin Camerer Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[15]  V. Smith,et al.  Positive reciprocity and intentions in trust games , 2003 .

[16]  U. Fischbacher,et al.  The nature of human altruism , 2003, Nature.

[17]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[18]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[19]  Colin Camerer,et al.  A Cognitive Hierarchy Model of Games , 2004 .

[20]  John C. Harsanyi,et al.  Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[21]  U. Fischbacher,et al.  Social norms and human cooperation , 2004, Trends in Cognitive Sciences.

[22]  S. Quartz,et al.  Getting to Know You: Reputation and Trust in a Two-Person Economic Exchange , 2005, Science.

[23]  Prashant Doshi,et al.  On the Difficulty of Achieving Equilibrium in Interactive POMDPs , 2006, AI&M.

[24]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[25]  Colin Camerer,et al.  Social neuroeconomics: the neural circuitry of social preferences , 2007, Trends in Cognitive Sciences.

[26]  A. Sanfey Social Decision-Making: Insights from Game Theory and Neuroscience , 2007, Science.

[27]  B. King-Casas,et al.  The Rupture and Repair of Cooperation in Borderline Personality Disorder , 2008, Science.

[28]  Daeyeol Lee Game theory and neural basis of social decision making , 2008, Nature Neuroscience.

[29]  Raymond J. Dolan,et al.  Game Theory of Mind , 2008, PLoS Comput. Biol..

[30]  Pearl H. Chiu,et al.  Self Responses along Cingulate Cortex Reveal Quantitative Neural Phenotype for High-Functioning Autism , 2008, Neuron.

[31]  Peter Dayan,et al.  Bayesian Model of Behaviour in Economic Games , 2008, NIPS.

[32]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[33]  P. Montague,et al.  Neuroeconomic Approaches to Mental Disorders , 2010, Neuron.

[34]  Marina Vannucci,et al.  Biosensor Approach to Psychopathology Classification , 2010, PLoS Comput. Biol..

[35]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[36]  Raymond J. Dolan,et al.  Disentangling the Roles of Approach, Activation and Valence in Instrumental and Pavlovian Responding , 2011, PLoS Comput. Biol..

[37]  Michael L. Littman,et al.  Using iterated reasoning to predict opponent strategies , 2011, AAMAS.

[38]  W. Marsden I and J , 2012 .

[39]  Peter Dayan,et al.  Computational Phenotyping of Two-Person Interactions Reveals Differential Neural Response to Depth-of-Thought , 2012, PLoS Comput. Biol..

[40]  T. Lohrenz,et al.  Computational Substrates of Norms and Their Violations during Social Exchange , 2013, The Journal of Neuroscience.

[41]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .