A primer on reinforcement learning in the brain : Psychological, computational, and neural perspectives
暂无分享,去创建一个
[1] E. Guthrie. Conditioning as a principle of learning. , 1930 .
[2] W. Brogden. Sensory pre-conditioning. , 1939 .
[3] D. Bernoulli. Exposition of a New Theory on the Measurement of Risk , 1954 .
[4] R J HERRNSTEIN,et al. Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.
[5] John Garcia,et al. Relation of cue to consequence in avoidance learning , 1966 .
[6] L. Kamin. Predictability, surprise, attention, and conditioning , 1967 .
[7] R. Rescorla. Probability of shock in the presence and absence of CS in fear conditioning. , 1968, Journal of comparative and physiological psychology.
[8] R. Herrnstein. On the law of effect. , 1970, Journal of the experimental analysis of behavior.
[9] R. Rescorla. A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .
[10] A. Tversky,et al. Prospect theory: analysis of decision under risk , 1979 .
[11] R. Rescorla. Simultaneous and successive associations in sensory preconditioning. , 1980, Journal of experimental psychology. Animal behavior processes.
[12] J. Pearce,et al. A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980 .
[13] Christopher D. Adams,et al. Instrumental Responding following Reinforcer Devaluation , 1981 .
[14] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[15] A. Tversky,et al. The framing of decisions and the psychology of choice. , 1981, Science.
[16] R. Rescorla,et al. Postconditioning devaluation of a reinforcer affects instrumental responding. , 1985 .
[17] R. Rescorla. Pavlovian conditioning. It's not what you think it is. , 1988, The American psychologist.
[18] M. Davison,et al. The matching law: A research review. , 1988 .
[19] C. Watkins. Learning from delayed rewards , 1989 .
[20] T. Caraco,et al. Risk-sensitivity: ambient temperature affects foraging choice , 1990, Animal Behaviour.
[21] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[22] B. Balleine,et al. Motivational control of goal-directed action , 1994 .
[23] A. Kacelnik,et al. Preferences for fixed and variable food sources: variability in amount and delay. , 1995, Journal of the experimental analysis of behavior.
[24] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[25] Ralph R. Miller,et al. Assessment of the Rescorla-Wagner model. , 1995 .
[26] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[27] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[28] William Bialek,et al. Spikes: Exploring the Neural Code , 1996 .
[29] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[30] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[31] B. Balleine,et al. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.
[32] S. Shafir. Risk-sensitive foraging: the effect of relative variability , 2000 .
[33] C. Gallistel,et al. Time, rate, and conditioning. , 2000, Psychological review.
[34] J. Wickens,et al. A cellular mechanism of reward-related learning , 2001, Nature.
[35] N. Logothetis,et al. Neurophysiological investigation of the basis of the fMRI signal , 2001, Nature.
[36] W. Schultz,et al. Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.
[37] J. Pearce,et al. Theories of associative learning in animals. , 2001, Annual review of psychology.
[38] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[39] M. Platt,et al. Weighing the Evidence: Neural Correlates of Sensory Judgements Neural Correlates of Decisions Remembrance of Things Past: Neural Correlates of Decisions Derived from Prior Knowledge , 2022 .
[40] W. Schultz. Getting Formal with Dopamine and Reward , 2002, Neuron.
[41] Eytan Ruppin,et al. Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.
[42] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[43] A. Kacelnik,et al. Framing effects and risky decisions in starlings , 2002, Proceedings of the National Academy of Sciences of the United States of America.
[44] Colin Camerer,et al. Behavioral Economics: Past, Present, Future , 2003 .
[45] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[46] W. Newsome,et al. Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.
[47] J J McDowell,et al. A computational model of selection by consequences. , 2004, Journal of the experimental analysis of behavior.
[48] Richard S. Sutton,et al. Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.
[49] Karl J. Friston,et al. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.
[50] L. Green,et al. A discounting framework for choice with delayed and probabilistic rewards. , 2004, Psychological bulletin.
[51] T. Caraco. Energy budgets, risk and foraging preferences in dark-eyed juncos (Junco hyemalis) , 1981, Behavioral Ecology and Sociobiology.
[52] Matthew T. Kaufman,et al. Distributed Neural Representation of Expected Value , 2005, The Journal of Neuroscience.
[53] K. Doya,et al. Representation of Action-Specific Reward Values in the Striatum , 2005, Science.
[54] R. Poldrack,et al. Prospect theory on the brain? Toward a cognitive neuroscience of decision under risk. , 2005, Brain research. Cognitive brain research.
[55] W. Schultz,et al. Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.
[56] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.
[57] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[58] W. Newsome,et al. Choosing the greater of two goods: neural currencies for valuation and decision making , 2005, Nature Reviews Neuroscience.
[59] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[60] Peter Dayan,et al. How fast to work: Response vigor, motivation and tonic dopamine , 2005, NIPS.
[61] M. Domjan. Pavlovian conditioning: a functional perspective. , 2005, Annual review of psychology.
[62] P. Glimcher,et al. JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .
[63] Constantin F. Aliferis,et al. Predicting dire outcomes of patients with community acquired pneumonia , 2005, J. Biomed. Informatics.
[64] John McCarthy,et al. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955 , 2006, AI Mag..
[65] P. Dayan,et al. Cortical substrates for exploratory decisions in humans , 2006, Nature.
[66] E. Vaadia,et al. Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.
[67] Michael R. Waldmann,et al. Causal Reasoning in Rats , 2006, Science.
[68] David S. Touretzky,et al. Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.
[69] K. Doya,et al. The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.
[70] A. Tversky,et al. Prospect theory: an analysis of decision under risk — Source link , 2007 .
[71] J. O'Doherty,et al. Reward Value Coding Distinct From Risk Attitude-Related Uncertainty Coding in Human Reward Systems , 2006, Journal of neurophysiology.
[72] K. Doya,et al. Multiple Representations of Belief States and Action Values in Corticobasal Ganglia Loops , 2007, Annals of the New York Academy of Sciences.
[73] W. Schultz. Multiple dopamine functions at different time courses. , 2007, Annual review of neuroscience.
[74] Ralph R. Miller,et al. Sometimes-competing retrieval (SOCR): a formalization of the comparator hypothesis. , 2007, Psychological review.
[75] M. Roesch,et al. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.
[76] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[77] Kenji Doya,et al. Reinforcement learning: Computational theory and biological mechanisms , 2007, HFSP journal.
[78] Steven C Stout,et al. Sometimes-competing retrieval (SOCR): a formalization of the comparator hypothesis. , 2007, Psychological review.
[79] Anna Koop,et al. Learning to Generalize through Predictive Representations: A Computational Model of Mediated Conditioning , 2008, SAB.
[80] Colin Camerer,et al. A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.
[81] Richard S. Sutton,et al. Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.
[82] Gal Yadid,et al. Dynamics of the dopaminergic system as a key component to the understanding of depression. , 2008, Progress in brain research.
[83] P. Dayan,et al. Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.
[84] Y. Niv,et al. Dialogues on prediction errors , 2008, Trends in Cognitive Sciences.
[85] Richard S. Sutton,et al. A computational model of hippocampal function in trace conditioning , 2008, NIPS.
[86] Yutaka Sakai,et al. The Actor-Critic Learning Is Behind the Matching Law: Matching Versus Optimal Behaviors , 2008, Neural Computation.
[87] Douglas A. Williams,et al. Timed excitatory conditioning under zero and negative contingencies. , 2008, Journal of experimental psychology. Animal behavior processes.
[88] Timothy E. J. Behrens,et al. Choice, uncertainty and value in prefrontal and cingulate cortex , 2008, Nature Neuroscience.
[89] W. Schultz. Introduction. Neuroeconomics: the promise and the profit , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.
[90] P. Dayan,et al. Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.
[91] Daniel A. Gottlieb. Is the number of trials a primary determinant of conditioned responding? , 2008, Journal of experimental psychology. Animal behavior processes.
[92] M. Platt,et al. Risky business: the neuroeconomics of decision making under uncertainty , 2008, Nature Neuroscience.
[93] J. Staddon,et al. The behavioral economics of choice and interval timing. , 2009, Psychological review.
[94] Klaus Wunderlich,et al. Neural computations underlying action-based decision making in the human brain , 2009, Proceedings of the National Academy of Sciences.
[95] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[96] Nasimeh Asgarian,et al. Learning to predict relapse in invasive ductal carcinomas based on the subcellular localization of junctional proteins , 2010, Breast Cancer Research and Treatment.
[97] Y. Niv. Reinforcement learning in the brain , 2009 .
[98] T. Maia. Reinforcement learning, conditioning, and the brain: Successes and challenges , 2009, Cognitive, affective & behavioral neuroscience.
[99] H. Sebastian Seung,et al. Operant Matching as a Nash Equilibrium of an Intertemporal Game , 2009, Neural Computation.
[100] Zeb Kurth-Nelson,et al. Temporal-Difference Reinforcement Learning with Distributed Representations , 2009, PloS one.
[101] S. Kennerley,et al. Evaluating choices by single neurons in the frontal lobe: outcome value encoded across multiple decision variables , 2009, The European journal of neuroscience.
[102] B. Love,et al. Short-term gains, long-term pains: How cues about state aid learning in dynamic environments , 2009, Cognition.
[103] I. Izquierdo,et al. Dopamine Controls Persistence of Long-Term Memory Storage , 2009, Science.
[104] R. C. Honey,et al. "Causal reasoning" in rats: a reappraisal. , 2009, Journal of experimental psychology. Animal behavior processes.
[105] Jonathan D. Cohen,et al. Explicit melioration by a neural diffusion model , 2009, Brain Research.
[106] K. Doya,et al. Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia , 2009, The Journal of Neuroscience.
[107] Jung Hoon Sul,et al. Role of Striatum in Updating Values of Chosen Actions , 2009, The Journal of Neuroscience.
[108] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[109] M. Rushworth,et al. General Mechanisms for Making Decisions? This Review Comes from a Themed Issue on Cognitive Neuroscience Edited the Representation of Value and Reward Expectations in Frontal Cortex Reward Prediction Errors and Learning Rates Other Types of Prediction Error , 2022 .
[110] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[111] K. Deisseroth,et al. Phasic Firing in Dopaminergic Neurons Is Sufficient for Behavioral Conditioning , 2009, Science.
[112] C. Pennartz,et al. Single-Cell and Population Coding of Expected Reward Probability in the Orbitofrontal Cortex of the Rat , 2009, The Journal of Neuroscience.
[113] C. Gallistel,et al. Memory and the Computational Brain , 2009 .
[114] M. Roesch,et al. Ventral Striatal Neurons Encode the Value of the Chosen Action in Rats Deciding between Differently Delayed or Sized Rewards , 2009, The Journal of Neuroscience.
[115] M. Roesch,et al. A new perspective on the role of the orbitofrontal cortex in adaptive behaviour , 2009, Nature Reviews Neuroscience.
[116] A. Hama. Predictably Irrational: The Hidden Forces That Shape Our Decisions , 2010 .
[117] Mirko Farina. Supersizing the Mind: Embodiment, Action and Cognitive Extension. , 2010 .
[118] B. Balleine,et al. Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.
[119] P. I. Pavlov. Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. , 1929, Annals of Neurosciences.