论文信息 - Avoiding Wireheading with Value Reinforcement Learning

Avoiding Wireheading with Value Reinforcement Learning

How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading.

Marcus Hutter | Tom Everitt | Marcus Hutter | Tom Everitt

[1] James L Olds,et al. Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.

[2] G. Smith. ANARCHY, STATE, AND UTOPIA , 1976 .

[3] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[4] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] P. Kleingeld,et al. The Stanford Encyclopedia of Philosophy , 2013 .

[8] Shane Legg,et al. Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[9] Stephen M. Omohundro,et al. The Basic AI Drives , 2008, AGI.

[10] C. Allen,et al. Stanford Encyclopedia of Philosophy , 2011 .

[11] Laurent Orseau,et al. Delusion, Survival, and Intelligent Agents , 2011, AGI.

[12] Daniel Dewey,et al. Learning What to Value , 2011, AGI.

[13] Bill Hibbard,et al. Model-based Utility Functions , 2011, J. Artif. Gen. Intell..

[14] Nick Bostrom. Hail Mary, Value Porosity, and Utility Diversication , 2014 .

[15] Can Eren Sezener. Inferring Human Values for Safe AGI Design , 2015, AGI.

[16] Stuart Armstrong,et al. Motivated Value Selection for Artificial Agents , 2015, AAAI Workshop: AI and Ethics.

[17] Marcus Hutter,et al. Sequential Extensions of Causal and Evidential Decision Theory , 2015, ADT.

[18] Kareem Amin,et al. Towards Resolving Unidentifiability in Inverse Reinforcement Learning , 2016, ArXiv.

[19] Noah D. Goodman,et al. Learning the Preferences of Ignorant, Inconsistent Agents , 2015, AAAI.

[20] Marcus Hutter,et al. Self-Modification of Policy and Utility Function in Rational Agents , 2016, AGI.

[21] Marcus Hutter,et al. Death and Suicide in Universal Artificial Intelligence , 2016, AGI.

[22] C. Robert. Superintelligence: Paths, Dangers, Strategies , 2017 .

[23] Raymond C. Kurzweil,et al. The Singularity Is Near , 2018, The Infinite Desire for Growth.

[24] Nate Soares,et al. The Value Learning Problem , 2018, Artificial Intelligence Safety and Security.