Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings

Agents are systems that optimize an objective function in an environment. Together, the goal and the environment induce secondary objectives, incentives. Modeling the agent-environment interaction in graphical models called influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes is the agent incentivized to observe, and (2) which nodes is the agent incentivized to influence? The answers tell us which information and influence points need extra protection. For example, we may want a classifier for job applications to not use the ethnicity of the candidate, and a reinforcement learning agent not to take direct control of its reward mechanism. Different algorithms and training paradigms can lead to different influence diagrams, so our method can be used to identify algorithms with problematic incentives and help in designing algorithms with better incentives.

[1]  Risto Miikkulainen,et al.  The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities , 2018, Artificial Life.

[2]  R. Scheines,et al.  Interventions and Causal Inference , 2007, Philosophy of Science.

[3]  Ross D. Shachter,et al.  Decision-Theoretic Foundations for Causal Reasoning , 1995, J. Artif. Intell. Res..

[4]  Marcus Hutter,et al.  Self-Modification of Policy and Utility Function in Rational Agents , 2016, AGI.

[5]  Finn Verner Jensen,et al.  Myopic Value of Information in Influence Diagrams , 1997, UAI.

[6]  Stuart Armstrong,et al.  Motivated Value Selection for Artificial Agents , 2015, AAAI Workshop: AI and Ethics.

[7]  Marek J. Druzdzel,et al.  Causal Models, Value of Intervention, and Search for Opportunities , 2002, Probabilistic Graphical Models.

[8]  Ross D. Shachter,et al.  Pearl Causality and the Value of Control , 2016 .

[9]  Marcus Hutter,et al.  AGI Safety Literature Review , 2018, IJCAI.

[10]  M. Swaminathan Thinking From Inside the Box , 2008, Seminars in cardiothoracic and vascular anesthesia.

[11]  Bill Hibbard,et al.  Model-based Utility Functions , 2011, J. Artif. Gen. Intell..

[12]  Eric Maskin,et al.  Markov Perfect Equilibrium: I. Observable Actions , 2001, J. Econ. Theory.

[13]  Christopher Meek,et al.  Strong completeness and faithfulness in Bayesian networks , 1995, UAI.

[14]  Lu Zhang,et al.  Anti-discrimination learning: a causal modeling-based framework , 2017, International Journal of Data Science and Analytics.

[15]  Cathy O'Neil Weapons of Math Destruction , 2016 .

[16]  Laurent Orseau,et al.  Self-Modification and Mortality in Artificial Agents , 2011, AGI.

[17]  Tom Everitt,et al.  Towards Safe Artificial General Intelligence , 2018 .

[18]  Silvia Chiappa,et al.  Path-Specific Counterfactual Fairness , 2018, AAAI.

[19]  Ross D. Shachter,et al.  Influence Diagrams for Team Decision Analysis , 2005, Decis. Anal..

[20]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[21]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[22]  Dan Geiger,et al.  On the logic of causal models , 2013, UAI.

[23]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[24]  Laurent Orseau,et al.  Delusion, Survival, and Intelligent Agents , 2011, AGI.

[25]  Daphne Koller,et al.  Ignorable Information in Multi-Agent Scenarios , 2008 .

[26]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[27]  Ross D. Shachter Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams) , 1998, UAI.

[28]  Steffen L. Lauritzen,et al.  Representing and Solving Decision Problems with Limited Information , 2001, Manag. Sci..

[29]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[30]  A. Dawid Influence Diagrams for Causal Modelling and Inference , 2002 .

[31]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[32]  Laurent Orseau,et al.  Penalizing Side Effects using Stepwise Relative Reachability , 2018, AISafety@IJCAI.

[33]  Ramana Kumar,et al.  Modeling AGI Safety Frameworks with Causal Influence Diagrams , 2019, AISafety@IJCAI.

[34]  Judea Pearl,et al.  Causal networks: semantics and expressiveness , 2013, UAI.

[35]  Pratik Gajane,et al.  On formalizing fairness in prediction with machine learning , 2017, ArXiv.

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Ross D. Shachter Efficient Value of Information Computation , 1999, UAI.

[38]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[39]  Nick Bostrom,et al.  Thinking Inside the Box: Controlling and Using an Oracle AI , 2012, Minds and Machines.

[40]  Laurent Orseau,et al.  Safely Interruptible Agents , 2016, UAI.

[41]  Enrico Fagiuoli,et al.  A note about redundancy in influence diagrams , 1998, Int. J. Approx. Reason..

[42]  Marcus Hutter,et al.  Sequential Extensions of Causal and Evidential Decision Theory , 2015, ADT.

[43]  Anca D. Dragan,et al.  The Off-Switch Game , 2016, IJCAI.

[44]  Thomas S. Woodson Weapons of math destruction , 2018, Journal of Responsible Innovation.

[45]  S. Brison The Intentional Stance , 1989 .

[46]  Laurent Orseau,et al.  Measuring and avoiding side effects using relative reachability , 2018, ArXiv.

[47]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[48]  Thomas D. Nielsen,et al.  Welldefined Decision Scenarios , 1999, UAI.

[49]  P. Bickel,et al.  Sex Bias in Graduate Admissions: Data from Berkeley , 1975, Science.

[50]  Nick Bostrom,et al.  Superintelligence: Paths, Dangers, Strategies , 2014 .

[51]  Jürgen Schmidhuber,et al.  Goedel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements , 2003, ArXiv.

[52]  Chris Clifton,et al.  Combating discrimination using Bayesian networks , 2014, Artificial Intelligence and Law.

[53]  James E. Matheson,et al.  Describing and Valuing Interventions That Observe or Control Decision Situations , 2005, Decis. Anal..

[54]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[55]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[56]  Stuart Armstrong,et al.  Good and safe uses of AI Oracles , 2017, ArXiv.

[57]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[58]  Sharad Goel,et al.  The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.

[59]  Ya'akov Gal,et al.  Networks of Influence Diagrams: A Formalism for Representing Agents' Beliefs and Decision-Making Processes , 2008, J. Artif. Intell. Res..

[60]  E. Eells Causal Decision Theory , 1984, PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association.

[61]  Marcus Hutter,et al.  Avoiding Wireheading with Value Reinforcement Learning , 2016, AGI.

[62]  Christian P. Robert,et al.  Decision-Theoretic Foundations , 2007 .

[63]  Francesco Bonchi,et al.  Exposing the probabilistic causal structure of discrimination , 2015, International Journal of Data Science and Analytics.

[64]  Marcus Hutter,et al.  A Game-Theoretic Analysis of the Off-Switch Game , 2017, AGI.

[65]  J. Schreiber Foundations Of Statistics , 2016 .