Reinforcement Learning with a Corrupted Reward Channel
暂无分享,去创建一个
Laurent Orseau | Shane Legg | Tom Everitt | Victoria Krakovna | S. Legg | Laurent Orseau | Tom Everitt | Victoria Krakovna
[1] Ming Li,et al. Average Case Complexity Under the Universal Distribution Equals Worst-Case Complexity , 1992, Inf. Process. Lett..
[2] Noah D. Goodman,et al. Learning the Preferences of Ignorant, Inconsistent Agents , 2015, AAAI.
[3] David H. Wolpert,et al. No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..
[4] Jessica Taylor,et al. Quantilizers: A Safer Alternative to Maximizers for Limited Optimization , 2016, AAAI Workshop: AI, Ethics, and Society.
[5] Roman V. Yampolskiy,et al. Utility function security in artificially intelligent agents , 2014, J. Exp. Theor. Artif. Intell..
[6] Anca D. Dragan,et al. The Off-Switch Game , 2016, IJCAI.
[7] Marcus Hutter,et al. Universal Artificial Intellegence - Sequential Decisions Based on Algorithmic Probability , 2005, Texts in Theoretical Computer Science. An EATCS Series.
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] C. Robert. Superintelligence: Paths, Dangers, Strategies , 2017 .
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[13] Marcus Hutter,et al. Universal Reinforcement Learning Algorithms: Survey and Experiments , 2017, IJCAI.
[14] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.
[15] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[16] Laurent Orseau,et al. Delusion, Survival, and Intelligent Agents , 2011, AGI.
[17] Mark O. Riedl,et al. Using Stories to Teach Human Values to Artificial Agents , 2016, AAAI Workshop: AI, Ethics, and Society.
[18] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[19] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..