Intrinsic Motivation and Reinforcement Learning

Psychologists distinguish between extrinsically motivated behavior, which is behavior undertaken to achieve some externally supplied reward, such as a prize, a high grade, or a high-paying job, and intrinsically motivated behavior, which is behavior done for its own sake. Is an analogous distinction meaningful for machine learning systems? Can we say of a machine learning system that it is motivated to learn, and if so, is it possible to provide it with an analog of intrinsic motivation? Despite the fact that a formal distinction between extrinsic and intrinsic motivation is elusive, this chapter argues that the answer to both questions is assuredly “yes” and that the machine learning framework of reinforcement learning is particularly appropriate for bringing learning together with what in animals one would call motivation. Despite the common perception that a reinforcement learning agent’s reward has to be extrinsic because the agent has a distinct input channel for reward signals, reinforcement learning provides a natural framework for incorporating principles of intrinsic motivation.

[1]  J. Stevens,et al.  Animal Intelligence , 1883, Nature.

[2]  M. Washburn,et al.  The Play of Man. , 1902 .

[3]  W. Cannon The Wisdom of the Body , 1932 .

[4]  F. W. Irwin Purposive Behavior in Animals and Men , 1932, The Psychological Clinic.

[5]  I. Hendrick Instinct and The Ego during Infancy , 2007 .

[6]  THE INSTITUTE OF RADIO ENGINEERS , 1943, Science.

[7]  L. S. Kogan Review of Principles of Behavior. , 1943 .

[8]  B. Skinner,et al.  Principles of Behavior , 1944 .

[9]  Vannevar Bush,et al.  Science, the endless frontier : A report to the President , 2011 .

[10]  H. Harlow Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950, Journal of comparative and physiological psychology.

[11]  Harlow Hf Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[12]  H. Harlow,et al.  Learning motivated by a manipulation drive. , 1950, Journal of experimental psychology.

[13]  W. N. Schoenfeld,et al.  Essentials of behavior. , 1952 .

[14]  E. Hilgard A Behavior System: An Introduction to Behavior Theory Concerning the Individual Organism. , 1954 .

[15]  D. Berlyne A theory of human curiosity. , 1954, British journal of psychology.

[16]  James L Olds,et al.  Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.

[17]  B. G. Farley,et al.  Generalization of pattern recognition in a self-organizing system , 1955, AFIPS '55 (Western).

[18]  W. N. Dember,et al.  Analysis of exploratory, manipulatory, and curiosity behaviors. , 1957, Psychological review.

[19]  W. N. Dember,et al.  Response by rats to differential stimulus complexity. , 1957, Journal of comparative and physiological psychology.

[20]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[21]  G. Kimble,et al.  Hilgard and Marquis' Conditioning and learning , 1961 .

[22]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[23]  E. Feigenbaum,et al.  Computers and Thought , 1963 .

[24]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[25]  C. N. Cofer,et al.  Motivation: Theory and research. , 1964 .

[26]  K. Fu,et al.  A heuristic approach to reinforcement learning control systems , 1965 .

[27]  P. Young Hedonic organization and regulation of behavior. , 1966, Psychological review.

[28]  D. Berlyne Curiosity and exploration. , 1966, Science.

[29]  Jerry M. Mendel,et al.  Adaptive, learning, and pattern recognition systems : theory and applications , 1970 .

[30]  F. R. A. Hopgood,et al.  Machine Intelligence 2 , 1970, The Mathematical Gazette.

[31]  D. Berlyne,et al.  Aesthetics and Psychobiology , 1975 .

[32]  S. Mollenauer Shifts in deprivation level: Different effects depending on amount of preshift training ☆ , 1971 .

[33]  J. Piaget,et al.  The Origins of Intelligence in Children , 1971 .

[34]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[35]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[36]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[37]  Douglas B. Lenat,et al.  AM, an artificial intelligence approach to discovery in mathematics as heuristic search , 1976 .

[38]  D. Bindra How adaptive behavior is produced: a perceptual-motivational alternative to response reinforcements , 1978, Behavioral and Brain Sciences.

[39]  H. L. Petri Motivation: Theory and Research , 1981 .

[40]  A. N. Epstein Instinct and Motivation as Explanations for Complex Behavior , 1982 .

[41]  H. Arkes,et al.  Psychological theories of motivation , 1982 .

[42]  N. Mackintosh,et al.  Conditioning And Associative Learning , 1983 .

[43]  John S. Edwards,et al.  The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .

[44]  Edward L. Deci,et al.  Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[45]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[46]  C. V. D. Malsburg,et al.  Frank Rosenblatt: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms , 1986 .

[47]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[48]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[49]  K. Miller,et al.  Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[50]  李幼升,et al.  Ph , 1989 .

[51]  Shaul Markovitch,et al.  Learning Novel Domains Through Curiosity and Conjecture , 1989, IJCAI.

[52]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[53]  J. Lichtenberg On motivational systems. , 1990, Journal of the American Psychoanalytic Association.

[54]  J. Urgen Schmidhuber,et al.  Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[55]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[56]  David H. Ackley,et al.  Interactions between learning and evolution , 1991 .

[57]  Richard S. Sutton,et al.  Reinforcement learning architectures for animats , 1991 .

[58]  David H. Ackley,et al.  Adaptation in Constant Utility Non-Stationary Environments , 1991, ICGA.

[59]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[60]  D. McFarland,et al.  Intelligent behavior in animals and robots , 1993 .

[61]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[62]  Jerry M. Mendel,et al.  Reinforcement-learning control and pattern recognition systems , 1994 .

[63]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[64]  Corso Elvezia What's Interesting? , 1997 .

[65]  Rosalind W. Picard Affective Computing , 1997 .

[66]  Alan Pickering,et al.  Conditioning and learning , 1997 .

[67]  Rosalind W. Picard Affective computing: (526112012-054) , 1997 .

[68]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[69]  Jürgen Schmidhuber,et al.  Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[70]  Tony Savage,et al.  Artificial motives: A review of motivation in artificial creatures , 2000, Connect. Sci..

[71]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[72]  Larry Samuelson,et al.  Introduction to the Evolution of Preferences , 2001, J. Econ. Theory.

[73]  James L. McClelland,et al.  Autonomous Mental Development by Robots and Animals , 2001, Science.

[74]  Peter Dayan,et al.  Motivated Reinforcement Learning , 2001, NIPS.

[75]  S. S. Stevens,et al.  Learning, motivation, and emotion , 2002 .

[76]  B. Balleine,et al.  The Role of Learning in the Operation of Motivational Systems , 2002 .

[77]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[78]  P. Petta,et al.  Emotions in Humans and Artifacts , 2003 .

[79]  Dirk van Rijn,et al.  Proceedings of the 31st annual conference of the Cognitive Science Society , 2003 .

[80]  Giulio Sandini,et al.  Developmental robotics: a survey , 2003, Connect. Sci..

[81]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[82]  Andrea Lockerd Thomaz,et al.  Tutelage and Collaboration for Humanoid Robots , 2004, Int. J. Humanoid Robotics.

[83]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[84]  Philippe Gaussier,et al.  Learning Invariant Sensorimotor Behaviors: A Developmental Approach to Imitation Mechanisms , 2004, Adapt. Behav..

[85]  Theodoros Damoulas,et al.  Valency for Adaptive Homeostatic Agents: Relating Evolution and Learning , 2005, ECAL.

[86]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[87]  Andrea Lockerd Thomaz,et al.  Experiments in socially guided machine learning: understanding how humans teach , 2006, HRI '06.

[88]  Jeroen M. Swinkels,et al.  Information, evolution and utility , 2006 .

[89]  C. Breazeal,et al.  Transparency and Socially Guided Machine Learning , 2006 .

[90]  G. Baldassarre,et al.  Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[91]  Wolfram Schultz,et al.  Reward signals , 2007, Scholarpedia.

[92]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[93]  Martin V. Butz,et al.  Anticipatory Behavior in Adaptive Learning Systems, From Brains to Individual and Social Behavior [the book is a result from the third workshop on anticipatory behavior in adaptive learning systems, ABiALS 2006, Rome, Italy, September 30, 2006, colocated with SAB 2006] , 2007, ABiALS book.

[94]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[95]  Henrik I. Christensen,et al.  Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning , 2008, Adapt. Behav..

[96]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[97]  P. Dayan,et al.  tHe Cognitive neuroSCienCe of Motivation and learning , 2008 .

[98]  Jürgen Schmidhuber,et al.  Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.

[99]  Kenji Doya,et al.  Finding intrinsic rewards by embodied evolution and constrained reinforcement learning , 2008, Neural Networks.

[100]  Gillian M. Hayes,et al.  Evolution of Valence Systems in an Unstable Environment , 2008, SAB.

[101]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[102]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[103]  Ralf Der,et al.  Modulated Exploratory Dynamics Can Shape Self-Organized Behavior , 2009, Adv. Complex Syst..

[104]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[105]  Karl J. Friston,et al.  Action and behavior: a free-energy formulation , 2010, Biological Cybernetics.

[106]  Richard L. Lewis,et al.  Internal Rewards Mitigate Agent Boundedness , 2010, ICML.

[107]  Pierre-Yves Oudeyer,et al.  Intrinsically motivated goal exploration for active motor learning in robots: A case study , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[108]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[109]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .

[110]  D. Linden The Compass of Pleasure: How Our Brains Make Fatty Foods, Orgasm, Exercise, Marijuana, Generosity, Vodka, Learning, and Gambling Feel So Good , 2011 .

[111]  D. Pfaff The Physiological Mechanisms of Motivation , 2012 .

[112]  Donald Michie,et al.  BOXES: AN EXPERIMENT IN ADAPTIVE CONTROL , 2013 .

[113]  Stephen Hart,et al.  Intrinsically Motivated Affordance Discovery and Modeling , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[114]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[115]  D. Berlyne Conflict, arousal, and curiosity , 2014 .