Reward is enough
暂无分享,去创建一个
Doina Precup | David Silver | Satinder Singh | Richard Sutton | R. Sutton | Doina Precup | Satinder Singh | David Silver
[1] Alec Radford,et al. Learning to summarize from human feedback , 2020, NeurIPS.
[2] Richard L. Lewis,et al. A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints , 2010 .
[3] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..
[4] Shane Legg,et al. Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.
[5] M. Weber. Economy and society : an outline of interpretive sociology , 2008 .
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.
[8] E. Tolman. Purposive behavior in animals and men , 1932 .
[9] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.
[10] Geoffrey E. Hinton,et al. Unsupervised learning : foundations of neural computation , 1999 .
[11] G. Becker,et al. The Economic Approach to Human Behavior , 1978 .
[12] Karl J. Friston,et al. Action and Perception as Divergence Minimization , 2020, ArXiv.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Olivier Pietquin,et al. Observational Learning by Reinforcement Learning , 2017, AAMAS.
[15] Devika Subramanian,et al. Provably Bounded Optimal Agents , 1993, IJCAI.
[16] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[17] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.
[18] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[19] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[20] Karl J. Friston. The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.
[21] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[22] R. Polanía,et al. Rational inattention in mice , 2021, bioRxiv.
[23] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[24] G. Debreu. Mathematical Economics: Representation of a preference ordering by a numerical function , 1983 .
[25] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[26] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[27] H. Simon,et al. A Behavioral Model of Rational Choice , 1955 .
[28] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..
[29] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .
[30] A. Clark. Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.
[31] B. Skinner,et al. The Behavior of Organisms: An Experimental Analysis , 2016 .
[32] Satinder Singh,et al. Computational Rationality: Linking Mechanism and Behavior Through Bounded Utility Maximization , 2014, Top. Cogn. Sci..
[33] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.
[34] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[35] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[36] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[37] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[38] Dr. Marcus Hutter,et al. Universal artificial intelligence , 2004 .
[39] Laurent Orseau,et al. Space-Time Embedded Intelligence , 2012, AGI.
[40] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.
[41] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[42] Todd A. Hare,et al. Neural codes in early sensory areas maximize fitness , 2021, bioRxiv.
[43] J. Hawkins,et al. On Intelligence , 2004 .
[44] Amos Storkey,et al. Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.
[45] M. P. Friedman,et al. HANDBOOK OF PERCEPTION , 1977 .
[46] Martin Müller,et al. Computer Go , 2002, Artif. Intell..
[47] Joaquin Vanschoren,et al. Meta-Learning: A Survey , 2018, ArXiv.
[48] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[49] A. Bandura. Social learning theory , 1977 .
[50] Ronald Ashri. What Is AI , 2020 .
[51] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[52] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[53] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.
[54] J. Goodman. Note on Existence and Uniqueness of Equilibrium Points for Concave N-Person Games , 1965 .
[55] John R Anderson,et al. An integrated theory of the mind. , 2004, Psychological review.
[56] A. Newell. Unified Theories of Cognition , 1990 .
[57] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[58] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.