The Incentives that Shape Behaviour

Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to? We formalise these incentives, and demonstrate unique graphical criteria for detecting them in any single decision causal influence diagram. To this end, we introduce structural causal influence models, a hybrid of the influence diagram and structural causal model frameworks. Finally, we illustrate how these incentives predict agent incentives in both fairness and AI safety applications.

[1]  Judea Pearl,et al.  Direct and Indirect Effects , 2001, UAI.

[2]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[3]  Enrico Fagiuoli,et al.  A note about redundancy in influence diagrams , 1998, Int. J. Approx. Reason..

[4]  Moritz Hardt,et al.  Strategic Classification is Causal Modeling in Disguise , 2019, ICML.

[5]  Ramana Kumar,et al.  Modeling AGI Safety Frameworks with Causal Influence Diagrams , 2019, AISafety@IJCAI.

[6]  Moritz Hardt,et al.  Strategic Adaptation to Classifiers: A Causal Perspective , 2019, ArXiv.

[7]  Judea Pearl,et al.  Axioms of Causal Relevance , 1997, Artif. Intell..

[8]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[9]  Shane Legg,et al.  Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings , 2019, ArXiv.

[10]  R. M. Oliver,et al.  Influence diagrams, belief nets and decision analysis , 1992 .

[11]  Ross D. Shachter,et al.  Decision-Theoretic Foundations for Causal Reasoning , 1995, J. Artif. Intell. Res..

[12]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[13]  Steffen L. Lauritzen,et al.  Representing and Solving Decision Problems with Limited Information , 2001, Manag. Sci..

[14]  J. Robins,et al.  Identifiability and Exchangeability for Direct and Indirect Effects , 1992, Epidemiology.

[15]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[16]  Marcus Hutter,et al.  Asymptotically Unambitious Artificial General Intelligence , 2019, AAAI.

[17]  Nick Bostrom,et al.  Thinking Inside the Box: Controlling and Using an Oracle AI , 2012, Minds and Machines.

[18]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[19]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[20]  Silvia Chiappa,et al.  A Causal Bayesian Networks Viewpoint on Fairness , 2018, Privacy and Identity Management.

[21]  A. Dawid Influence Diagrams for Causal Modelling and Inference , 2002 .

[22]  Silvia Chiappa,et al.  Path-Specific Counterfactual Fairness , 2018, AAAI.

[23]  Nick Bostrom,et al.  The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents , 2012, Minds and Machines.

[24]  Thomas D. Nielsen,et al.  Welldefined Decision Scenarios , 1999, UAI.

[25]  Marcus Hutter,et al.  Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective , 2019, Synthese.

[26]  Laurent Orseau,et al.  Safely Interruptible Agents , 2016, UAI.

[27]  Thomas S. Woodson Weapons of math destruction , 2018, Journal of Responsible Innovation.