Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective

There is great interest in building intrinsic motivation into artificial systems using the reinforcement learning framework. Yet, what intrinsic motivation may mean computationally, and how it may differ from extrinsic motivation, remains a murky and controversial subject. In this paper, we adopt an evolutionary perspective and define a new optimal reward framework that captures the pressure to design good primary reward functions that lead to evolutionary success across environments. The results of two computational experiments show that optimal primary reward signals may yield both emergent intrinsic and extrinsic motivation. The evolutionary perspective and the associated optimal reward framework thus lead to the conclusion that there are no hard and fast features distinguishing intrinsic and extrinsic reward computationally. Rather, the directness of the relationship between rewarding behavior and evolutionary success varies along a continuum.

[1]  G. Zajicek,et al.  The Wisdom of the Body , 1934, Nature.

[2]  L. S. Kogan Review of Principles of Behavior. , 1943 .

[3]  B. Skinner,et al.  Principles of Behavior , 1944 .

[4]  P. Young Food-seeking drive, affective process, and learning. , 1949, Psychological review.

[5]  H. Harlow Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950, Journal of comparative and physiological psychology.

[6]  Harlow Hf Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[7]  W. N. Schoenfeld,et al.  Essentials of behavior. , 1952 .

[8]  E. Hilgard A Behavior System: An Introduction to Behavior Theory Concerning the Individual Organism. , 1954 .

[9]  James L Olds,et al.  Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.

[10]  B. Kotkov Motivation: Theory And Research , 1965 .

[11]  Peter Secretan Learning , 1965, Mental Health.

[12]  D. Berlyne Curiosity and exploration. , 1966, Science.

[13]  Douglas B. Lenat,et al.  AM, an artificial intelligence approach to discovery in mathematics as heuristic search , 1976 .

[14]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Edward L. Deci,et al.  Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[16]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[17]  J. Urgen Schmidhuber,et al.  Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[18]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[19]  David H. Ackley,et al.  Interactions between learning and evolution , 1991 .

[20]  David H. Ackley,et al.  Adaptation in Constant Utility Non-Stationary Environments , 1991, ICGA.

[21]  D. Cass,et al.  Indefinitely sustained consumption despite exhaustible natural resources , 1991 .

[22]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[23]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[24]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[25]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[26]  J. Schmidhuber What''s interesting? , 1997 .

[27]  B. Roche,et al.  The Behavior of Organisms? , 1997 .

[28]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[29]  Jürgen Schmidhuber,et al.  Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[30]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[31]  Tony Savage,et al.  Artificial motives: A review of motivation in artificial creatures , 2000, Connect. Sci..

[32]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[33]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[34]  Samuel M. McClure,et al.  A computational substrate for incentive salience , 2003, Trends in Neurosciences.

[35]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[36]  Kenji Doya,et al.  An Evolutionary Approach to Automatic Construction of the Structure in Hierarchical Reinforcement Learning , 2003, GECCO.

[37]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[38]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[39]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[40]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[41]  Theodoros Damoulas,et al.  Valency for Adaptive Homeostatic Agents: Relating Evolution and Learning , 2005, ECAL.

[42]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[43]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[44]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[45]  Jeroen M. Swinkels,et al.  Information, evolution and utility , 2006 .

[46]  E. Uchibe,et al.  Constrained reinforcement learning from intrinsic and extrinsic rewards , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[47]  Wolfram Schultz,et al.  Reward , 1927, Scholarpedia.

[48]  G. Baldassarre,et al.  Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[49]  Wolfram Schultz,et al.  Reward signals , 2007, Scholarpedia.

[50]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[51]  Richard S. Sutton,et al.  On the role of tracking in stationary environments , 2007, ICML '07.

[52]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[53]  Henrik I. Christensen,et al.  Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning , 2008, Adapt. Behav..

[54]  Sriraam Natarajan,et al.  Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[55]  Kenji Doya,et al.  Finding intrinsic rewards by embodied evolution and constrained reinforcement learning , 2008, Neural Networks.

[56]  Gillian M. Hayes,et al.  Evolution of Valence Systems in an Unstable Environment , 2008, SAB.

[57]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[58]  Kathryn E. Merrick,et al.  Motivated Reinforcement Learning - Curious Characters for Multiuser Games , 2009 .

[59]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[60]  Richard L. Lewis,et al.  Internal Rewards Mitigate Agent Boundedness , 2010, ICML.

[61]  Andrew G. Barto,et al.  Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.