The Incentives that Shape Behaviour

Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to? We formalise these incentives, and demonstrate unique graphical criteria for detecting them in any single decision causal influence diagram. To this end, we introduce structural causal influence models, a hybrid of the influence diagram and structural causal model frameworks. Finally, we illustrate how these incentives predict agent incentives in both fairness and AI safety applications.

[1]  Ramana Kumar,et al.  Modeling AGI Safety Frameworks with Causal Influence Diagrams , 2019, AISafety@IJCAI.

[2]  Marcus Hutter,et al.  Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective , 2019, Synthese.

[3]  Judea Pearl,et al.  Axioms of Causal Relevance , 1997, Artif. Intell..

[4]  Judea Pearl,et al.  Direct and Indirect Effects , 2001, UAI.

[5]  Thomas S. Woodson Weapons of math destruction , 2018, Journal of Responsible Innovation.

[6]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[7]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[8]  Silvia Chiappa,et al.  A Causal Bayesian Networks Viewpoint on Fairness , 2018, Privacy and Identity Management.

[9]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[10]  Steffen L. Lauritzen,et al.  Representing and Solving Decision Problems with Limited Information , 2001, Manag. Sci..

[11]  A. Dawid Influence Diagrams for Causal Modelling and Inference , 2002 .

[12]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[13]  Moritz Hardt,et al.  Strategic Adaptation to Classifiers: A Causal Perspective , 2019, ArXiv.

[14]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[15]  Marcus Hutter,et al.  Asymptotically Unambitious Artificial General Intelligence , 2019, AAAI.

[16]  Thomas D. Nielsen,et al.  Welldefined Decision Scenarios , 1999, UAI.

[17]  R. M. Oliver,et al.  Influence diagrams, belief nets and decision analysis , 1992 .

[18]  Ross D. Shachter,et al.  Decision-Theoretic Foundations for Causal Reasoning , 1995, J. Artif. Intell. Res..

[19]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[20]  Moritz Hardt,et al.  Strategic Classification is Causal Modeling in Disguise , 2019, ICML.

[21]  Silvia Chiappa,et al.  Path-Specific Counterfactual Fairness , 2018, AAAI.

[22]  Nick Bostrom,et al.  The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents , 2012, Minds and Machines.

[23]  Nick Bostrom,et al.  Thinking Inside the Box: Controlling and Using an Oracle AI , 2012, Minds and Machines.

[24]  J. Robins,et al.  Identifiability and Exchangeability for Direct and Indirect Effects , 1992, Epidemiology.

[25]  Shane Legg,et al.  Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings , 2019, ArXiv.

[26]  Laurent Orseau,et al.  Safely Interruptible Agents , 2016, UAI.

[27]  Enrico Fagiuoli,et al.  A note about redundancy in influence diagrams , 1998, Int. J. Approx. Reason..