Genetic Programming for Reward Function Search

Reward functions in reinforcement learning have largely been assumed given as part of the problem being solved by the agent. However, the psychological notion of intrinsic motivation has recently inspired inquiry into whether there exist alternate reward functions that enable an agent to learn a task more easily than the natural task-based reward function allows. This paper presents a genetic programming algorithm to search for alternate reward functions that improve agent learning performance. We present experiments that show the superiority of these reward functions, demonstrate the possible scalability of our method, and define three classes of problems where reward function search might be particularly useful: distributions of environments, nonstationary environments, and problems with short agent lifetimes.

[1]  David H. Ackley,et al.  Interactions between learning and evolution , 1991 .

[2]  David H. Ackley,et al.  Adaptation in Constant Utility Non-Stationary Environments , 1991, ICGA.

[3]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[4]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[5]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[6]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[7]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[8]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[9]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[10]  Lee Spector,et al.  Genetic Programming and Autoconstructive Evolution with the Push Programming Language , 2002, Genetic Programming and Evolvable Machines.

[11]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[12]  Theodoros Damoulas,et al.  Valency for Adaptive Homeostatic Agents: Relating Evolution and Learning , 2005, ECAL.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Maarten Keijzer,et al.  The Push3 execution stack and the evolution of control , 2005, GECCO '05.

[15]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[16]  G. Baldassarre,et al.  Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[17]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[18]  Henrik I. Christensen,et al.  Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning , 2008, Adapt. Behav..

[19]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[20]  David M. Clark,et al.  Genetic programming for finite algebras , 2008, GECCO '08.

[21]  Kenji Doya,et al.  Finding intrinsic rewards by embodied evolution and constrained reinforcement learning , 2008, Neural Networks.

[22]  Gillian M. Hayes,et al.  Evolution of Valence Systems in an Unstable Environment , 2008, SAB.

[23]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[24]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[25]  Richard L. Lewis,et al.  Internal Rewards Mitigate Agent Boundedness , 2010, ICML.

[26]  Andrew G. Barto,et al.  Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.