暂无分享,去创建一个
[1] Stuart Armstrong,et al. Good and safe uses of AI Oracles , 2017, ArXiv.
[2] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[3] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[4] Daniel Dewey,et al. Learning What to Value , 2011, AGI.
[5] Joshua A. Tucker,et al. Is Online Political Communication More Than an Echo Chamber? , 2022 .
[6] Jure Leskovec,et al. Steering user behavior with badges , 2013, WWW.
[7] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[8] A. Tversky,et al. Judgment under Uncertainty , 1982 .
[9] Koen Holtman,et al. Corrigibility with Utility Preservation , 2019, ArXiv.
[10] Sergey Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[11] Laurent Orseau,et al. Reinforcement Learning with a Corrupted Reward Channel , 2017, IJCAI.
[12] Nando de Freitas,et al. RL Unplugged: Benchmarks for Offline Reinforcement Learning , 2020, ArXiv.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Justin M. Rao,et al. Filter Bubbles, Echo Chambers, and Online News Consumption , 2016 .
[15] Shawn P. Curley,et al. Do Recommender Systems Manipulate Consumer Preferences? A Study of Anchoring Effects , 2013, Inf. Syst. Res..
[16] Arushi Majha,et al. Categorizing Wireheading in Partially Embedded Agents , 2019, AISafety@IJCAI.
[17] Weiyan Shi,et al. Effects of Persuasive Dialogues: Testing Bot Identities and Inquiry Strategies , 2020, CHI.
[18] Nando de Freitas,et al. Neural Programmer-Interpreters , 2015, ICLR.
[19] Nick Bostrom,et al. Thinking Inside the Box: Controlling and Using an Oracle AI , 2012, Minds and Machines.
[20] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[21] Marcus Hutter,et al. Reinforcement learning with value advice , 2014, ACML.
[22] Yee Whye Teh,et al. Neural probabilistic motor primitives for humanoid control , 2018, ICLR.
[23] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .
[24] Marcus Hutter,et al. Asymptotically Unambitious Artificial General Intelligence , 2019, AAAI.
[25] Laurent Orseau,et al. Delusion, Survival, and Intelligent Agents , 2011, AGI.
[26] Scott Garrabrant,et al. Risks from Learned Optimization in Advanced Machine Learning Systems , 2019, ArXiv.
[27] Shane Legg,et al. Scalable agent alignment via reward modeling: a research direction , 2018, ArXiv.
[28] Tom Everitt,et al. Towards Safe Artificial General Intelligence , 2018 .
[29] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[30] Koen Holtman,et al. Towards AGI Agent Safety by Iteratively Improving the Utility Function , 2020, AGI.
[31] Jason Mancuso,et al. Detecting Spiky Corruption in Markov Decision Processes , 2019, AISafety@IJCAI.
[32] Tom Eccles,et al. An investigation of model-free planning , 2019, ICML.
[33] Marcus Hutter,et al. Avoiding Wireheading with Value Reinforcement Learning , 2016, AGI.
[34] Laurent Orseau,et al. Pitfalls of learning a reward function online , 2020, IJCAI.
[35] D. Kahneman,et al. Heuristics and Biases: The Psychology of Intuitive Judgment , 2002 .
[36] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[37] Anca D. Dragan,et al. Reward-rational (implicit) choice: A unifying formalism for reward learning , 2020, NeurIPS.
[38] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[39] Laurent Orseau,et al. Safely Interruptible Agents , 2016, UAI.
[40] P. Stone,et al. TAMER: Training an Agent Manually via Evaluative Reinforcement , 2008, 2008 7th IEEE International Conference on Development and Learning.
[41] Anca D. Dragan,et al. The Off-Switch Game , 2016, IJCAI.
[42] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[43] Nick Bostrom,et al. Superintelligence: Paths, Dangers, Strategies , 2014 .
[44] D. Kahneman. Maps of Bounded Rationality: Psychology for Behavioral Economics , 2003 .
[45] Karol Hausman,et al. Learning to Interactively Learn and Assist , 2019, AAAI.
[46] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[47] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.