Intrinsic Social Motivation via Causal Influence in Multi-Agent RL

We derive a new intrinsic social motivation for multi-agent reinforcement learning (MARL), in which agents are rewarded for having causal influence over another agent's actions. Causal influence is assessed using counterfactual reasoning. The reward does not depend on observing another agent's reward function, and is thus a more realistic approach to MARL than taken in previous work. We show that the causal influence reward is related to maximizing the mutual information between agents' actions. We test the approach in challenging social dilemma environments, where it consistently leads to enhanced cooperation between agents and higher collective reward. Moreover, we find that rewarding influence can lead agents to develop emergent communication protocols. We therefore employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward. Finally, we show that influence can be computed by equipping each agent with an internal model that predicts the actions of other agents. This allows the social influence reward to be computed without the use of a centralised controller, and as such represents a significantly more general and scalable inductive bias for MARL with independent agents.

[1]  K. Frisch The dance language and orientation of bees , 1967 .

[2]  T. Schelling Hockey Helmets, Concealed Weapons, and Daylight Saving , 1973 .

[3]  J. Sobel,et al.  STRATEGIC INFORMATION TRANSMISSION , 1982 .

[4]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[6]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[7]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[8]  Pierre-Yves Oudeyer,et al.  Discovering communication , 2006, Connect. Sci..

[9]  M. Tomasello,et al.  Humans Have Evolved Specialized Skills of Social Cognition: The Cultural Intelligence Hypothesis , 2007, Science.

[10]  Philippe Capdepuy,et al.  Maximization of Potential Information Flow as a Universal Utility for Collective Behaviour , 2007, 2007 IEEE Symposium on Artificial Life.

[11]  M. Tomasello Why We Cooperate , 2009 .

[12]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[13]  A. Melis,et al.  How is human cooperation different? , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[14]  Mikhail Prokopenko,et al.  Differentiating information transfer and causal effect , 2008, 0812.4373.

[15]  A. Sanford,et al.  Expectations in counterfactual and theory of mind reasoning , 2010 .

[16]  Judith M Burkart,et al.  Social learning and evolution: the cultural intelligence hypothesis , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[17]  Ana Paiva,et al.  Emerging social awareness: Exploring intrinsic motivation in multiagent learning , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[18]  L. Carver,et al.  Research review: Social motivation and oxytocin in autism--implications for joint attention development and intervention. , 2013, Journal of child psychology and psychiatry, and allied disciplines.

[19]  Minjie Zhang,et al.  Emotional Multiagent Reinforcement Learning in Social Dilemmas , 2013, PRIMA.

[20]  Sam Devlin,et al.  Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.

[21]  Dirk Lindebaum Sapiens: A Brief History of Humankind - A Review , 2015 .

[22]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[23]  J. Henrich The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter , 2015 .

[24]  Toru Yanagawa,et al.  Untangling Brain-Wide Dynamics in Consciousness by Cross-Embedding , 2015, PLoS Comput. Biol..

[25]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[26]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[27]  J. Pearl,et al.  Causal Inference in Statistics: A Primer , 2016 .

[28]  Pierre-Yves Oudeyer,et al.  How Evolution May Work Through Curiosity-Driven Developmental Process , 2016, Top. Cogn. Sci..

[29]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Pierre-Yves Oudeyer,et al.  A Unified Model of Speech and Tool Use Early Development , 2017, CogSci.

[31]  Iyad Rahwan,et al.  Cooperating with machines , 2017, Nature Communications.

[32]  K. Laland Darwin's Unfinished Symphony: How Culture Made the Human Mind , 2017 .

[33]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[34]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[35]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[36]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[37]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[38]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[39]  Julian Togelius,et al.  New And Surprising Ways to Be Mean , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[40]  Nando de Freitas,et al.  Compositional Obverter Communication Learning From Raw Visual Input , 2018, ICLR.

[41]  Alexander Peysakhovich,et al.  Consequentialist conditional cooperation in social dilemmas with imperfect information , 2017, AAAI Workshops.

[42]  Joshua B. Tenenbaum,et al.  Learning to Share and Hide Intentions using Information Regularization , 2018, NeurIPS.

[43]  H. Francis Song,et al.  Machine Theory of Mind , 2018, ICML.

[44]  Nicholas R. Waytowich,et al.  Measuring collaborative emergent behavior in multi-agent reinforcement learning , 2018, IHSED.

[45]  Stephen Clark,et al.  Emergent Communication through Negotiation , 2018, ICLR.

[46]  J. Pearl,et al.  The Book of Why: The New Science of Cause and Effect , 2018 .

[47]  Stephen Clark,et al.  Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[48]  Jonathan Berant,et al.  Emergence of Communication in an Interactive World with Consistent Speakers , 2018, ArXiv.