暂无分享,去创建一个
Yoshua Bengio | Hugo Larochelle | Marc G. Bellemare | Carles Gelada | William Fedus | Yoshua Bengio | H. Larochelle | Carles Gelada | W. Fedus
[1] P. Samuelson. A Note on Measurement of Utility , 1937 .
[2] R. H. Strotz. Myopia and Inconsistency in Dynamic Utility Maximization , 1955 .
[3] R. Bellman. A Markovian Decision Process , 1957 .
[4] Richard Bellman,et al. ON A ROUTING PROBLEM , 1958 .
[5] G. Ainslie. Specious reward: a behavioral theory of impulsiveness and impulse control. , 1975, Psychological bulletin.
[6] L. Green,et al. Preference reversal and self control: choice as a function of reward amount and delay , 1981 .
[7] J. E. Mazur. Probability and delay of reinforcement as factors in discrete-trial choice. , 1985, Journal of the experimental analysis of behavior.
[8] J. E. Mazur. An adjusting procedure for studying delayed reinforcement. , 1987 .
[9] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[10] S. C. Suddarth,et al. Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.
[11] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[12] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.
[13] Ming Tan,et al. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.
[14] Eugene A. Feinberg,et al. Markov Decision Models with Weighted Discounted Criteria , 1994, Math. Oper. Res..
[15] L. Green,et al. Temporal discounting and preference reversals in choice between delayed outcomes , 1994, Psychonomic bulletin & review.
[16] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[17] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[18] L. Green,et al. Discounting of delayed rewards: Models of individual choice. , 1995, Journal of the experimental analysis of behavior.
[19] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[20] G. Loewenstein. Out of control: Visceral influences on behavior , 1996 .
[21] J. E. Mazur. Choice, delay, probability, and conditioned reinforcement , 1997 .
[22] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[23] A. Kacelnik. Normative and descriptive models of decision making: time discounting and risk sensitivity. , 2007, Ciba Foundation symposium.
[24] L. Green,et al. Rate of temporal discounting decreases with amount of reward , 1997, Memory & cognition.
[25] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[26] Peter D. Sozou,et al. On hyperbolic discounting and uncertain hazard rates , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.
[27] Eugene A. Feinberg,et al. Constrained dynamic programming with two discount factors: applications and an algorithm , 1999, IEEE Trans. Autom. Control..
[28] David S. Touretzky,et al. Behavioral considerations suggest an average reward TD model of the dopamine system , 2000, Neurocomputing.
[29] G. Loewenstein,et al. Time Discounting and Time Preference: A Critical Review , 2002 .
[30] N. Daw,et al. Reinforcement learning models of the dopamine system and their behavioral implications , 2003 .
[31] Saori C. Tanaka,et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.
[32] L. Green,et al. A discounting framework for choice with delayed and probabilistic rewards. , 2004, Psychological bulletin.
[33] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[34] Edmund H. Durfee,et al. Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors , 2005, IJCAI.
[35] E. Maskin,et al. Uncertainty and Hyperbolic Discounting , 2005 .
[36] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[37] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[38] John R. Anderson,et al. From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.
[39] R. French. Catastrophic Forgetting in Connectionist Networks , 2006 .
[40] Colin Camerer,et al. A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.
[41] Saori C. Tanaka,et al. Low-Serotonin Levels Increase Delayed Reward Discounting in Humans , 2008, The Journal of Neuroscience.
[42] T. Maia. Reinforcement learning, conditioning, and the brain: Successes and challenges , 2009, Cognitive, affective & behavioral neuroscience.
[43] Zeb Kurth-Nelson,et al. Temporal-Difference Reinforcement Learning with Distributed Representations , 2009, PloS one.
[44] Z. Kurth-Nelson,et al. Neural Models of Temporal Discounting , 2009 .
[45] William H. Alexander,et al. Hyperbolically Discounted Temporal Difference Learning , 2010, Neural Computation.
[46] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[47] Tor Lattimore,et al. Time Consistent Discounting , 2011, ALT.
[48] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[49] Michael L. Littman,et al. Expressing Tasks Robustly via Multiple Discount Factors , 2015 .
[50] Damien Ernst,et al. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.
[51] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[52] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[53] Doreen Meier. Picoeconomics The Strategic Interaction Of Successive Motivational States Within The Person , 2016 .
[54] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[55] Kenji Doya,et al. Average Reward Optimization with Multiple Discounting Reinforcement Learners , 2017, ICONIP.
[56] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[57] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[58] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[59] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.
[60] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[61] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[62] Satinder Singh,et al. Many-Goals Reinforcement Learning , 2018, ArXiv.
[63] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[64] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[65] P. Pilarski,et al. Generalizing Value Estimation over Timescale , 2018 .
[66] Silviu Pitis,et al. Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach , 2019, AAAI.
[67] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[68] Joelle Pineau,et al. Separating value functions across time-scales , 2019, ICML 2019.