Avoiding Wireheading with Value Reinforcement Learning

How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading.

[1]  James L Olds,et al.  Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.

[2]  G. Smith ANARCHY, STATE, AND UTOPIA , 1976 .

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  P. Kleingeld,et al.  The Stanford Encyclopedia of Philosophy , 2013 .

[8]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[9]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[10]  C. Allen,et al.  Stanford Encyclopedia of Philosophy , 2011 .

[11]  Laurent Orseau,et al.  Delusion, Survival, and Intelligent Agents , 2011, AGI.

[12]  Daniel Dewey,et al.  Learning What to Value , 2011, AGI.

[13]  Bill Hibbard,et al.  Model-based Utility Functions , 2011, J. Artif. Gen. Intell..

[14]  Nick Bostrom Hail Mary, Value Porosity, and Utility Diversication , 2014 .

[15]  Can Eren Sezener Inferring Human Values for Safe AGI Design , 2015, AGI.

[16]  Stuart Armstrong,et al.  Motivated Value Selection for Artificial Agents , 2015, AAAI Workshop: AI and Ethics.

[17]  Marcus Hutter,et al.  Sequential Extensions of Causal and Evidential Decision Theory , 2015, ADT.

[18]  Kareem Amin,et al.  Towards Resolving Unidentifiability in Inverse Reinforcement Learning , 2016, ArXiv.

[19]  Noah D. Goodman,et al.  Learning the Preferences of Ignorant, Inconsistent Agents , 2015, AAAI.

[20]  Marcus Hutter,et al.  Self-Modification of Policy and Utility Function in Rational Agents , 2016, AGI.

[21]  Marcus Hutter,et al.  Death and Suicide in Universal Artificial Intelligence , 2016, AGI.

[22]  C. Robert Superintelligence: Paths, Dangers, Strategies , 2017 .

[23]  Raymond C. Kurzweil,et al.  The Singularity Is Near , 2018, The Infinite Desire for Growth.

[24]  Nate Soares,et al.  The Value Learning Problem , 2018, Artificial Intelligence Safety and Security.