论文信息 - Reward is enough

Reward is enough

Abstract In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.

[1] Alec Radford,et al. Learning to summarize from human feedback , 2020, NeurIPS.

[2] Richard L. Lewis,et al. A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints , 2010 .

[3] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[4] Shane Legg,et al. Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[5] M. Weber. Economy and society : an outline of interpretive sociology , 2008 .

[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[8] E. Tolman. Purposive behavior in animals and men , 1932 .

[9] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[10] Geoffrey E. Hinton,et al. Unsupervised learning : foundations of neural computation , 1999 .

[11] G. Becker,et al. The Economic Approach to Human Behavior , 1978 .

[12] Karl J. Friston,et al. Action and Perception as Divergence Minimization , 2020, ArXiv.

[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14] Olivier Pietquin,et al. Observational Learning by Reinforcement Learning , 2017, AAMAS.

[15] Devika Subramanian,et al. Provably Bounded Optimal Agents , 1993, IJCAI.

[16] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[17] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[18] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[19] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[20] Karl J. Friston. The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[21] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[22] R. Polanía,et al. Rational inattention in mice , 2021, bioRxiv.

[23] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[24] G. Debreu. Mathematical Economics: Representation of a preference ordering by a numerical function , 1983 .

[25] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[26] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[27] H. Simon,et al. A Behavioral Model of Rational Choice , 1955 .

[28] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..

[29] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .

[30] A. Clark. Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[31] B. Skinner,et al. The Behavior of Organisms: An Experimental Analysis , 2016 .

[32] Satinder Singh,et al. Computational Rationality: Linking Mechanism and Behavior Through Bounded Utility Maximization , 2014, Top. Cogn. Sci..

[33] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[34] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[35] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[37] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[38] Dr. Marcus Hutter,et al. Universal artificial intelligence , 2004 .

[39] Laurent Orseau,et al. Space-Time Embedded Intelligence , 2012, AGI.

[40] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.

[41] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[42] Todd A. Hare,et al. Neural codes in early sensory areas maximize fitness , 2021, bioRxiv.

[43] J. Hawkins,et al. On Intelligence , 2004 .

[44] Amos Storkey,et al. Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[45] M. P. Friedman,et al. HANDBOOK OF PERCEPTION , 1977 .

[46] Martin Müller,et al. Computer Go , 2002, Artif. Intell..

[47] Joaquin Vanschoren,et al. Meta-Learning: A Survey , 2018, ArXiv.

[48] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[49] A. Bandura. Social learning theory , 1977 .

[50] Ronald Ashri. What Is AI , 2020 .

[51] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[52] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[53] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[54] J. Goodman. Note on Existence and Uniqueness of Equilibrium Points for Concave N-Person Games , 1965 .

[55] John R Anderson,et al. An integrated theory of the mind. , 2004, Psychological review.

[56] A. Newell. Unified Theories of Cognition , 1990 .

[57] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[58] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.