Pavlovian-Instrumental Interaction in ‘Observing Behavior’

Subjects typically choose to be presented with stimuli that predict the existence of future reinforcements. This so-called ‘observing behavior’ is evident in many species under various experimental conditions, including if the choice is expensive, or if there is nothing that subjects can do to improve their lot with the information gained. A recent study showed that the activities of putative midbrain dopamine neurons reflect this preference for observation in a way that appears to challenge the common prediction-error interpretation of these neurons. In this paper, we provide an alternative account according to which observing behavior arises from a small, possibly Pavlovian, bias associated with the operation of working memory.

[1]  L. Wyckoff The role of observing responses in discrimination learning. , 1952, Psychological review.

[2]  W. F. Prokasy,et al.  The acquisition of observing responses in the absence of differential external reinforcement. , 1956, Journal of comparative and physiological psychology.

[3]  D. Berlyne Uncertainty and conflict: a point of contact between information-theory and behavior-theory concepts. , 1957, Psychology Review.

[4]  R. Luce,et al.  On the possible psychophysical laws. , 1959, Psychological review.

[5]  L. Wyckoff Toward a quantitative theory of secondary reinforcement. , 1959, Psychological review.

[6]  K. Breland,et al.  The misbehavior of organisms. , 1961 .

[7]  C. C. Perkins,et al.  Acquisition of Observing Responses (Ro) with Water Reward , 1965, Psychological reports.

[8]  C. C. Perkins,et al.  Conditions affecting acquisition of observing responses in the absence of differential reward. , 1965, Journal of comparative and physiological psychology.

[9]  D. R. Williams,et al.  Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement. , 1969, Journal of the experimental analysis of behavior.

[10]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[11]  N. Mackintosh A Theory of Attention: Variations in the Associability of Stimuli with Reinforcement , 1975 .

[12]  Evan L. Porteus,et al.  Temporal Resolution of Uncertainty and Dynamic Choice Theory , 1978 .

[13]  J. Harsh,et al.  Choosing between predictable and unpredictable shock conditions: Data and theory. , 1979 .

[14]  M. Perone,et al.  Reinforcement of human observing behavior by a stimulue correlated with extinction or increased effort. , 1980, Journal of the experimental analysis of behavior.

[15]  E. Kehoe,et al.  Blocking acquisition of the rabbit's nictitating membrane response to serial conditioned stimuli , 1981 .

[16]  Helen B. Daly,et al.  A mathematical model of reward and aversive nonreward : its application in over 30 appetitive learning situations , 1982 .

[17]  J. Dinsmoor Observing and conditioned reinforcement , 1983, Behavioral and Brain Sciences.

[18]  E. Fantino,et al.  Human observing: Maintained by stimuli correlated with reinforcement but not extinction. , 1983, Journal of the experimental analysis of behavior.

[19]  G. Loewenstein Anticipation and the Valuation of Delayed Consumption , 1987 .

[20]  H. B. Daly Preference for unpredictable food rewards occurs with high proportion of reinforced trials or alcohol if rewards are not delayed. , 1989, Journal of experimental psychology. Animal behavior processes.

[21]  Stephen Grossberg,et al.  Neural dynamics of adaptive timing and temporal discrimination during associative learning , 1989, Neural Networks.

[22]  R. Barnet,et al.  Suboptimal choice in a percentage-reinforcement procedure: effects of signal condition and terminal-link length. , 1990, Journal of the experimental analysis of behavior.

[23]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[24]  P. Goldman-Rakic,et al.  Modulation of memory fields by dopamine Dl receptors in prefrontal cortex , 1995, Nature.

[25]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[26]  D. A. Lieberman,et al.  The Role of S− in Human Observing Behavior: Bad News Is Sometimes Better Than No News ☆ , 1997 .

[27]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[28]  N. Aplin,et al.  PSYCHOLOGICAL EXPECTED UTILITY THEORY AND ANTICIPATORY FEELINGS * A , 1997 .

[29]  B. Richmond,et al.  Neuronal Signals in the Monkey Ventral Striatum Related to Progress through a Predictable Series of Trials , 1998, The Journal of Neuroscience.

[30]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[31]  T. Zentall,et al.  Observing Behavior in Pigeons: The Effect of Reinforcement Probability and Response Cost Using a Symmetrical Choice Procedure , 1999 .

[32]  D. Kahneman,et al.  Living with uncertainty: Attractiveness and resolution timing. , 2000 .

[33]  Michael J. Frank,et al.  Interactions between frontal cortex and basal ganglia in working memory: A computational model , 2001, Cognitive, affective & behavioral neuroscience.

[34]  J. Salamone,et al.  Motivational views of reinforcement: implications for understanding the behavioral functions of nucleus accumbens dopamine , 2002, Behavioural Brain Research.

[35]  W. Schultz,et al.  Coding of Predicted Reward Omission by Dopamine Neurons in a Conditioned Inhibition Paradigm , 2003, The Journal of Neuroscience.

[36]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[37]  D. Buonomano,et al.  The neural basis of temporal processing. , 2004, Annual review of neuroscience.

[38]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[39]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[40]  B. Balleine Neural bases of food-seeking: Affect, arousal and reward in corticostriatolimbic circuits , 2005, Physiology & Behavior.

[41]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[42]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[43]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[44]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[45]  Jonathan D. Cohen,et al.  Adaptive gain and the role of the locus coeruleus–norepinephrine system in optimal performance , 2005, The Journal of comparative neurology.

[46]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[47]  P. Dayan,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.8 Full text provided by www.sciencedirect.com A normative perspective on motivation , 2022 .

[48]  Peter Dayan,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[49]  Brunilde Sansò,et al.  Can DiffServ guarantee IP QoS under failures? , 2005, IEEE Network.

[50]  Cristian S. Calude The mathematical theory of information , 2007 .

[51]  Peter Dayan,et al.  The role of value systems in decision making. , 2008 .

[52]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[53]  W. Singer,et al.  Better Than Conscious , 2008 .

[54]  Holly C. Miller,et al.  Preference for 50% reinforcement over 75% reinforcement by pigeons , 2009, Learning & behavior.

[55]  P. Dayan Prospective and retrospective temporal difference learning , 2009, Network.

[56]  Ethan S. Bromberg-Martin,et al.  Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards , 2009, Neuron.

[57]  P. Dayan,et al.  A common mechanism for adaptive scaling of reward and novelty , 2010, Human brain mapping.

[58]  Jan Khre,et al.  The Mathematical Theory of Information , 2012 .