论文信息 - Reinforcement learning, conditioning, and the brain: Successes and challenges - 字舞流文

Reinforcement learning, conditioning, and the brain: Successes and challenges

[1] E. Rolls,et al. The Orbitofrontal Cortex , 2019 .

[2] M. Commons. The effect of delay and of intervening events on reinforcement value , 2013 .

[3] M. Botvinick,et al. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[4] O. Hikosaka,et al. Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[5] M. Ungless,et al. Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli , 2009, Proceedings of the National Academy of Sciences.

[6] Colin Camerer,et al. Neural Response to Reward Anticipation under Risk Is Nonlinear in Probabilities , 2009, The Journal of Neuroscience.

[7] Richard S. Sutton,et al. Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[8] David H. Zald,et al. The Orbitofrontal Cortex , 2008 .

[9] George I. Christopoulos,et al. Neuronal Distortions of Reward Probability without Choice , 2008, The Journal of Neuroscience.

[10] D. Bullock,et al. A Local Circuit Model of Learned Striatal and Dopamine Cell Responses under Probabilistic Schedules of Reward , 2008, The Journal of Neuroscience.

[11] Colin Camerer,et al. Explicit neural signals reflecting reward uncertainty , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[12] Joseph J. Paton,et al. Moment-to-Moment Tracking of State Value in the Amygdala , 2008, The Journal of Neuroscience.

[13] B. Sahakian,et al. Acute Tryptophan Depletion in Healthy Volunteers Enhances Punishment Prediction but Does not Affect Reward Prediction , 2008, Neuropsychopharmacology.

[14] W. Schultz,et al. Influence of Reward Delays on Responses of Dopamine Neurons , 2008, The Journal of Neuroscience.

[15] Daeyeol Lee,et al. Prefrontal Coding of Temporally Discounted Values during Intertemporal Choice , 2008, Neuron.

[16] Y. Niv,et al. Dialogues on prediction errors , 2008, Trends in Cognitive Sciences.

[17] M. Trimble,et al. The Lateral Habenula: No Longer Neglected , 2008, CNS Spectrums.

[18] Kae Nakamura,et al. Reward-Dependent Modulation of Neuronal Activity in the Primate Dorsal Raphe Nucleus , 2008, The Journal of Neuroscience.

[19] N. Chater,et al. The probabilistic mind: prospects for Bayesian cognitive science , 2008 .

[20] Hua Zhao,et al. Lateral habenula lesions improve the behavioral response in depressed rats via increasing the serotonin level in dorsal raphe nucleus , 2008, Behavioural Brain Research.

[21] Samuel M. McClure,et al. BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[22] Gregory S. Berns,et al. Nonlinear neurobiological probability weighting functions for aversive outcomes , 2008, NeuroImage.

[23] P. Glimcher,et al. The neural correlates of subjective value during intertemporal choice , 2007, Nature Neuroscience.

[24] B. Richmond,et al. A Comparison of Reward‐Contingent Neuronal Activity in Monkey Orbitofrontal Cortex and Ventral Striatum , 2007, Annals of the New York Academy of Sciences.

[25] N. Rempel-Clower,et al. Role of Orbitofrontal Cortex Connections in Emotion , 2007, Annals of the New York Academy of Sciences.

[26] Matthijs A. A. van der Meer,et al. Integrating hippocampus and striatum in decision-making , 2007, Current Opinion in Neurobiology.

[27] Seth J. Ramus,et al. Linking affect to action: critical contributions of the orbitofrontal cortex. Preface. , 2007, Annals of the New York Academy of Sciences.

[28] N. Daw,et al. Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making , 2007, The Journal of Neuroscience.

[29] A. Kacelnik. Normative and descriptive models of decision making: time discounting and risk sensitivity. , 2007, Ciba Foundation symposium.

[30] P. Glimcher,et al. Statistics of midbrain dopamine neuron spike trains in the awake primate. , 2007, Journal of neurophysiology.

[31] R. Wightman,et al. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens , 2007, Nature Neuroscience.

[32] J. E. Mazur. Choice in a successive-encounters procedure and hyperbolic decay of reinforcement. , 2007, Journal of The Experimental Analysis of Behavior.

[33] O. Hikosaka,et al. Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[34] P. Shepard,et al. Lateral Habenula Stimulation Inhibits Rat Midbrain Dopamine Neurons through a GABAA Receptor-Mediated Mechanism , 2007, The Journal of Neuroscience.

[35] D. S. Zahm,et al. Glutamatergic Afferents of the Ventral Tegmental Area in the Rat , 2007, The Journal of Neuroscience.

[36] Adam Johnson,et al. A Computational Model of Craving and Obsession , 2007, Annals of the New York Academy of Sciences.

[37] K. Preuschoff,et al. Adding Prediction Risk to the Theory of Reward Learning , 2007, Annals of the New York Academy of Sciences.

[38] J. Wickens,et al. Striatal Contributions to Reward and Decision Making , 2007 .

[39] K. Doya,et al. Current trends in decision making. , 2007, Annals of the New York Academy of Sciences.

[40] J. O'Doherty,et al. Neural coding of reward-prediction error signals during classical conditioning with attractive faces. , 2007, Journal of neurophysiology.

[41] Brian Knutson,et al. Linking nucleus accumbens dopamine and blood oxygenation , 2007, Psychopharmacology.

[42] J. O'Doherty,et al. Reward Value Coding Distinct From Risk Attitude-Related Uncertainty Coding in Human Reward Systems , 2006, Journal of neurophysiology.

[43] R. Dolan,et al. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[44] S. Quartz,et al. Neural Differentiation of Expected Reward and Risk in Human Subcortical Structures , 2006, Neuron.

[45] L. Peoples,et al. Firing patterns of accumbal neurons during a pavlovian-conditioned approach task. , 2006, Journal of neurophysiology.

[46] David S. Touretzky,et al. Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[47] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[48] H. Yin,et al. The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[49] Henrik Walter,et al. Prediction error as a linear function of reward probability is coded in human nucleus accumbens , 2006, NeuroImage.

[50] Lawrence R. Frank,et al. Anterior cingulate activity modulates nonlinear decision weight function of uncertain prospects , 2006, NeuroImage.

[51] R. Poldrack,et al. Ventral–striatal/nucleus–accumbens sensitivity to prediction errors during classification learning , 2006, Human brain mapping.

[52] Evan M. Gordon,et al. Neural Signatures of Economic Preferences for Risk and Ambiguity , 2006, Neuron.

[53] Joseph J. Paton,et al. The primate amygdala represents the positive and negative value of visual stimuli during learning , 2006, Nature.

[54] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[55] A. Graybiel,et al. Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories , 2005, Nature.

[56] M. Roesch,et al. Orbitofrontal Cortex, Associative Learning, and Expectancies , 2005, Neuron.

[57] M. Platt,et al. Risk-sensitive neurons in macaque posterior cingulate cortex , 2005, Nature Neuroscience.

[58] A. Parent,et al. The striatofugal fiber system in primates: a reevaluation of its organization based on single-axon tracing studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[59] P. Glimcher,et al. Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[60] A M Graybiel,et al. Time-varying covariance of neural activities recorded in striatum and frontal cortex as monkeys perform sequential-saccade tasks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[61] W. Schultz,et al. Behavioral and Brain Functions , 2005 .

[62] P. Dayan,et al. Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[63] R. Poldrack,et al. Prospect theory on the brain? Toward a cognitive neuroscience of decision under risk. , 2005, Brain research. Cognitive brain research.

[64] W. Schultz,et al. Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[65] George Loewenstein,et al. CHAPTER ONE. Behavioral Economics: Past, Present, Future , 2004 .

[66] Samuel M. McClure,et al. Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.

[67] L. Green,et al. A discounting framework for choice with delayed and probabilistic rewards. , 2004, Psychological bulletin.

[68] M. Gluck,et al. Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. , 2004, Journal of neurophysiology.

[69] R. Hertwig,et al. Decisions from Experience and the Effect of Rare Events in Risky Choice , 2004, Psychological science.

[70] E. Vaadia,et al. Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[71] Karl J. Friston,et al. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[72] M. Gluck,et al. Cortico-striatal contributions to feedback-based learning: converging data from neuroimaging and neuropsychology. , 2004, Brain : a journal of neurology.

[73] D. Plaut,et al. Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential action. , 2004, Psychological review.

[74] J. Bolam,et al. Uniform Inhibition of Dopamine Neurons in the Ventral Tegmental Area by Aversive Stimuli , 2004, Science.

[75] Rebecca Elliott,et al. Instrumental responding for rewards is associated with enhanced neuronal response in subcortical reward systems , 2004, NeuroImage.

[76] M. Delgado,et al. Modulation of Caudate Activity by Action Contingency , 2004, Neuron.

[77] O. Hikosaka,et al. Dopamine Neurons Can Represent Context-Dependent Prediction Error , 2004, Neuron.

[78] S. Haber. The primate basal ganglia: parallel and integrative networks , 2003, Journal of Chemical Neuroanatomy.

[79] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[80] Nigel H. Goddard,et al. A neural model of frontostriatal interactions for behavioural planning and action chunking , 2003, Neurocomputing.

[81] G. Schoenbaum,et al. Neural Encoding in Ventral Striatum during Olfactory Discrimination Learning , 2003, Neuron.

[82] Karl J. Friston,et al. Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[83] Samuel M. McClure,et al. Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[84] T. Jay. Dopamine: a potential substrate for synaptic plasticity and memory mechanisms , 2003, Progress in Neurobiology.

[85] S. Killcross,et al. Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[86] W. Schultz,et al. Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[87] W. Schultz. Getting Formal with Dopamine and Reward , 2002, Neuron.

[88] J. Holzer. Frontal-Subcortical Circuits in Psychiatric and Neurological Disorders , 2002 .

[89] Sham M. Kakade,et al. Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[90] Roland E. Suri,et al. TD models of reward predictive responses in dopamine neurons , 2002, Neural Networks.

[91] G. Loewenstein,et al. Time Discounting and Time Preference: A Critical Review , 2002 .

[92] John N. J. Reynolds,et al. Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.

[93] Eytan Ruppin,et al. Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[94] B. Everitt,et al. Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[95] Kyle L Kirkland,et al. High-Tech Brains: A History of Technology-Based Analogies and Models of Nerve and Brain Function , 2002, Perspectives in biology and medicine.

[96] P. Montague,et al. Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[97] N. Logothetis,et al. Neurophysiological investigation of the basis of the fMRI signal , 2001, Nature.

[98] W. Schultz. Multiple reward signals in the brain , 2000, Nature Reviews Neuroscience.

[99] A. Grace,et al. The tonic/phasic model of dopamine system regulation and its implications for understanding alcohol and psychostimulant craving. , 2000, Addiction.

[100] Leslie Pack Kaelbling,et al. Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[101] W. Schultz,et al. Reward-related neuronal activity during go-nogo task performance in primate orbitofrontal cortex. , 2000, Journal of neurophysiology.

[102] J. Hollerman,et al. Reward processing in primate orbitofrontal cortex and basal ganglia. , 2000, Cerebral cortex.

[103] J. Horvitz. Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events , 2000, Neuroscience.

[104] K. Hikosaka,et al. Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. , 2000, Cerebral cortex.

[105] C. Cavada,et al. The anatomical connections of the macaque monkey orbitofrontal cortex. A review. , 2000, Cerebral cortex.

[106] D. Joel,et al. The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum , 2000, Neuroscience.

[107] C. I. Connolly,et al. Building neural representations of habits. , 1999, Science.

[108] S. Mobini,et al. Theory and method in the quantitative analysis of ”impulsive choice” behaviour: implications for psychopharmacology , 1999, Psychopharmacology.

[109] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[110] W. Schultz,et al. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[111] T. Gray. Functional and Anatomical Relationships among the Amygdala, Basal Forebrain, Ventral Striatum, and Cortex: An Integrative Discussion , 1999, Annals of the New York Academy of Sciences.

[112] W. Schultz,et al. Relative reward preference in primate orbitofrontal cortex , 1999, Nature.

[113] Ted O’Donoghue,et al. Doing It Now or Later , 1999 .

[114] F. Guarraci,et al. An electrophysiological characterization of ventral tegmental area dopaminergic neurons during differential pavlovian fear conditioning in the awake rabbit , 1999, Behavioural Brain Research.

[115] J. Price,et al. Prefrontal cortical projections to the hypothalamus in Macaque monkeys , 1998, The Journal of comparative neurology.

[116] Peter D. Sozou,et al. On hyperbolic discounting and uncertain hazard rates , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[117] J. Hollerman,et al. Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[118] A. Graybiel. The Basal Ganglia and Chunking of Action Repertoires , 1998, Neurobiology of Learning and Memory.

[119] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.

[120] N. Hiroi,et al. Preferential localization of self-stimulation sites in striosomes/patches in the rat striatum. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[121] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[122] P. Calabresi,et al. Synaptic plasticity and physiological interactions between dopamine and glutamate in the striatum , 1997, Neuroscience & Biobehavioral Reviews.

[123] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[124] H. de Wit,et al. Determination of discount functions in rats with an adjusting-amount procedure. , 1997, Journal of the experimental analysis of behavior.

[125] David I. Laibson,et al. Golden Eggs and Hyperbolic Discounting , 1997 .

[126] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.

[127] V. Grutta,et al. Lateral habenular influence on dorsal raphe neurons , 1996, Brain Research Bulletin.

[128] Jennifer A. Mangels,et al. A Neostriatal Habit Learning System in Humans , 1996, Science.

[129] A. Benabid,et al. Simultaneous Recording of Spontaneous Activities and Nociceptive Responses from Neurons in the Pars Compacta of Substantia Nigra and in the Lateral Habenula , 1996, The European journal of neuroscience.

[130] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[131] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[132] G. Loewenstein. Out of control: Visceral influences on behavior , 1996 .

[133] W. Schultz,et al. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli , 1996, Nature.

[134] Kenji Doya,et al. Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.

[135] L. Green,et al. Discounting of delayed rewards: Models of individual choice. , 1995, Journal of the experimental analysis of behavior.

[136] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[137] A. Graybiel,et al. Highly restricted origin of prefrontal cortical inputs to striosomes in the macaque monkey , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[138] W. Schultz,et al. Importance of unpredictability for reward responses in primate dopamine neurons. , 1994, Journal of neurophysiology.

[139] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[140] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[141] W. Schultz,et al. Neuronal activity in monkey ventral striatum related to the expectation of reward , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[142] A. Tversky,et al. Advances in prospect theory: Cumulative representation of uncertainty , 1992 .

[143] C. Geula,et al. Cytoarchitecture and neural afferents of orbitofrontal cortex in the brain of the monkey , 1992, The Journal of comparative neurology.

[144] A. Grace. Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis for the etiology of schizophrenia , 1991, Neuroscience.

[145] G. Loewenstein,et al. Decision making over time and under uncertainty: a common approach , 1991 .

[146] D. Cross,et al. Subjective probability and delay. , 1991, Journal of the experimental analysis of behavior.

[147] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[148] A. Graybiel. Neurotransmitters and neuromodulators in the basal ganglia , 1990, Trends in Neurosciences.

[149] R. Strecker,et al. Regulation of striatal serotonin release by the lateral habenula-dorsal raphe pathway in the rat as demonstrated by in vivo microdialysis: role of excitatory amino acids and GABA , 1989, Brain Research.

[150] J. Glowinski,et al. Effect of noxious tail pinch on the discharge rate of mesocortical and mesolimbic dopamine neurons: selective activation of the mesocortical system , 1989, Brain Research.

[151] B. Jacobs,et al. Lack of response of serotonergic neurons in the dorsal raphe nucleus of freely moving cats to stressful stimuli , 1988, Experimental Neurology.

[152] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[153] Melburn R. Park. Monosynaptic inhibitory postsynaptic potentials from lateral habenula recorded in dorsal raphe neurons , 1987, Brain Research Bulletin.

[154] R. Oades,et al. Ventral tegmental (A10) system: neurobiology. 1. Anatomy and connectivity , 1987, Brain Research Reviews.

[155] K. Wilcox,et al. Stimulation of the lateral habenula inhibits dopamine-containing neurons in the substantia nigra and ventral tegmental area of the rat , 1986, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[156] Geoffrey E. Hinton,et al. Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[157] C. Gerfen. The neostriatal mosaic. I. compartmental organization of projections from the striatum to the substantia nigra in the rat , 1985, The Journal of comparative neurology.

[158] A. Dickinson. Actions and habits: the development of behavioural autonomy , 1985 .

[159] J. Elster. Ulysses and the Sirens: Studies in Rationality and Irrationality , 1985 .

[160] B. Frost,et al. Functional activity in the lateral habenular and dorsal raphe nuclei following administration of several dopamine receptor antagonists. , 1984, Canadian Journal of Physiology and Pharmacology.

[161] C. Gerfen. The neostriatal mosaic: compartmentalization of corticostriatal input and striatonigral output systems , 1984, Nature.

[162] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[163] P. Soubrié,et al. The involvement of nigral serotonin innervation in the control of punishment-induced behavioral inhibition in rats , 1983, Pharmacology Biochemistry and Behavior.

[164] L. Brown,et al. A dopamine-sensitive striatal efferent system mapped with [14C]deoxyglucose in the rat , 1983, Brain Research.

[165] P. Soubrié,et al. Involvement of lateral habenula-dorsal raphe neurons in the differential regulation of striatal and nigral serotonergic transmission cats , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[166] Christopher D. Adams. Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .

[167] R. C. Collins,et al. Metabolic effects of unilateral lesion of the substantia nigra , 1981, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[168] Flint Schier. Ulysses and the Sirens , 1980 .

[169] John F. Marshall,et al. Plasticity of [14C]2-deoxy-d-glucose incorporation into neostriatum and related structures in response to dopamine neuron damage and apomorphine replacement , 1980, Brain Research.

[170] L. Sokoloff,et al. Influence of dopaminergic systems on the lateral habenular nucleus of the rat , 1980, Brain Research.

[171] W. Nauta,et al. Efferent connections of the habenular nuclei in the rat , 1979, The Journal of comparative neurology.

[172] Warren C. Stern,et al. Effects of electrical stimulation of the lateral habenula on single-unit activity of raphe neurons , 1979, Experimental Neurology.

[173] C. W. Ragsdale,et al. Histochemically distinct compartments in the striatum of human, monkeys, and cat demonstrated by acetylthiocholinesterase staining. , 1978, Proceedings of the National Academy of Sciences of the United States of America.

[174] S. Iversen,et al. 5-Hydroxytryptamine and punishment , 1977, Nature.

[175] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[176] G. Aghajanian,et al. Physiological evidence for habenula as major link between forebrain and midbrain raphe. , 1977, Science.

[177] G. Ainslie. Specious reward: a behavioral theory of impulsiveness and impulse control. , 1975, Psychological bulletin.

[178] A. Battersby. Plans and the Structure of Behavior , 1968 .

[179] G. Miller,et al. Plans and the structure of behavior , 1960 .

[180] S. S. Stevens. On the psychophysical law. , 1957, Psychological review.

[181] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.

[182] P. Samuelson. A Note on Measurement of Utility , 1937 .

[183] E. Thorndike. “Animal Intelligence” , 1898, Nature.

[184] A. Cooper,et al. Predictive Reward Signal of Dopamine Neurons , 2011 .

[185] P. Schrimpf,et al. Dynamic Programming , 2011 .

[186] O. Hikosaka,et al. Representation of negative motivational value in the primate lateral habenula , 2009, Nature Neuroscience.

[187] M. Ungless,et al. Phasic nociceptive responses in dorsal raphe serotonin neurons , 2008 .

[188] J. Wickens,et al. Striatal contributions to reward and decision making: making sense of regional variations in a reiterated processing matrix. , 2007, Annals of the New York Academy of Sciences.

[189] George E. Monahan,et al. A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[190] B. Balleine. Reward and decision making in corticobasal ganglia networks , 2007 .

[191] Sanjay Kaul,et al. Trial and error. How to avoid commonly encountered limitations of published clinical trials. , 2010, Journal of the American College of Cardiology.

[192] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[193] P. Dayan,et al. Actions , Policies , Values , and the Basal Ganglia , 2005 .

[194] Colin Camerer,et al. Behavioral Economics: Past, Present, Future , 2003 .

[195] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[196] N. Daw,et al. Reinforcement learning models of the dopamine system and their behavioral implications , 2003 .

[197] B. Knowlton,et al. Learning and memory functions of the Basal Ganglia. , 2002, Annual review of neuroscience.

[198] S. Salloway,et al. The frontal lobes and neuropsychiatric illness , 2001 .

[199] J. E. Mazur,et al. Hyperbolic value addition and general models of animal choice. , 2001, Psychological review.

[200] A. Dickinson,et al. Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[201] J. Metcalfe,et al. A hot/cool-system analysis of delay of gratification: dynamics of willpower. , 1999, Psychological review.

[202] G. Schoenbaum,et al. Orbitofrontal cortex and basolateral amygdala encode expected outcomes during learning , 1998, Nature Neuroscience.

[203] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[204] M. Domjan. The principles of learning and behavior, 4th ed. , 1998 .

[205] S. Haber,et al. The interface between dopamine neurons and the amygdala: implications for schizophrenia. , 1997, Schizophrenia bulletin.

[206] A. Barto,et al. Adaptive Critics and the Basal Ganglia , 1994 .

[207] Joel L. Davis,et al. Adaptive Critics and the Basal Ganglia , 1995 .

[208] A. Dickinson. CHAPTER 3 – Instrumental Conditioning , 1994 .

[209] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[210] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[211] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[212] W. Schultz,et al. Responses of nigrostriatal dopamine neurons to high-intensity somatosensory stimulation in the anesthetized monkey. , 1987, Journal of neurophysiology.

[213] J. E. Mazur. An adjusting procedure for studying delayed reinforcement. , 1987 .

[214] G. E. Alexander,et al. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. , 1986, Annual review of neuroscience.

[215] M. Domjan. The principles of learning and behavior , 1982 .

[216] G. Monahan. State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[217] A. Tversky,et al. Prospect theory: analysis of decision under risk , 1979 .

[218] R. Rescorla,et al. A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[219] D. Bernoulli. Exposition of a New Theory on the Measurement of Risk , 1954 .

[220] F. W. Irwin. Purposive Behavior in Animals and Men , 1932, The Psychological Clinic.

[221] F. Knight. The economic nature of the firm: From Risk, Uncertainty, and Profit , 2009 .