Effective reinforcement learning through evolutionary surrogate-assisted prescription

There is now significant historical data available on decision making in organizations, consisting of the decision problem, what decisions were made, and how desirable the outcomes were. Using this data, it is possible to learn a surrogate model, and with that model, evolve a decision strategy that optimizes the outcomes. This paper introduces a general such approach, called Evolutionary Surrogate-Assisted Prescription, or ESP. The surrogate is, for example, a random forest or a neural network trained with gradient descent, and the strategy is a neural network that is evolved to maximize the predictions of the surrogate model. ESP is further extended in this paper to sequential decision-making tasks, which makes it possible to evaluate the framework in reinforcement learning (RL) benchmarks. Because the majority of evaluations are done on the surrogate, ESP is more sample efficient, has lower variance, and lower regret than standard RL approaches. Surprisingly, its solutions are also better because both the surrogate and the strategy network regularize the decision making behavior. ESP thus forms a promising foundation to decision optimization in real-world problems.

[1]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  John J. Grefenstette,et al.  Genetic Search with Approximate Function Evaluation , 1985, ICGA.

[3]  C. Watkins Learning from delayed rewards , 1989 .

[4]  N. Cressie The origins of kriging , 1990 .

[5]  David H. Ackley,et al.  Interactions between learning and evolution , 1991 .

[6]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[7]  Gisbert Schneider,et al.  Artificial neural networks and simulated molecular evolution are potential tools for sequence-oriented protein design , 1994, Comput. Appl. Biosci..

[8]  Stephane Pierret,et al.  Turbomachinery Blade Design Using a Navier–Stokes Solver and Artificial Neural Network , 1999 .

[9]  C. A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[10]  Bernhard Sendhoff,et al.  On Evolutionary Optimization with Approximate Fitness Functions , 2000, GECCO.

[11]  Lee Spector,et al.  Autoconstructive Evolution: Push, PushGP, and Pushpop , 2001 .

[12]  Yaochu Jin,et al.  Quality Measures for Approximate Models in Evolutionary Computation , 2003 .

[13]  A. Keane,et al.  Evolutionary Optimization of Computationally Expensive Problems via Surrogate Modeling , 2003 .

[14]  K. Raman,et al.  Planning Marketing-Mix Strategies in the Presence of Interaction Effects , 2005 .

[15]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[16]  Jürgen Branke,et al.  Evolutionary optimization in uncertain environments-a survey , 2005, IEEE Transactions on Evolutionary Computation.

[17]  Risto Miikkulainen,et al.  Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[18]  Lee Spector,et al.  Genetic Programming for Reward Function Search , 2010, IEEE Transactions on Autonomous Mental Development.

[19]  Xin Yao,et al.  Robust optimization over time — A new perspective on dynamic optimization problems , 2010, IEEE Congress on Evolutionary Computation.

[20]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[21]  Yaochu Jin,et al.  Surrogate-assisted evolutionary computation: Recent advances and future challenges , 2011, Swarm Evol. Comput..

[22]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[23]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[24]  Shimon Whiteson,et al.  Evolutionary Computation for Reinforcement Learning , 2012, Reinforcement Learning.

[25]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[26]  Thomas B. Schön,et al.  From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[29]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[30]  Shimon Whiteson,et al.  Multi-Objective Deep Reinforcement Learning , 2016, ArXiv.

[31]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[32]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[33]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[34]  Risto Miikkulainen,et al.  PRETSL: Distributed Probabilistic Rule Evolution for Time-Series Classification , 2016, GPTP.

[35]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[36]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[37]  Kalyanmoy Deb,et al.  A population-based fast algorithm for a billion-dimensional resource allocation problem with integer variables , 2017, Eur. J. Oper. Res..

[38]  Alex Linley,et al.  Behavioral Medicine: Nutrition, Medication Management, and Exercise , 2017 .

[39]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[40]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[41]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[42]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[43]  Michael T. M. Emmerich,et al.  A tutorial on multiobjective optimization: fundamentals and evolutionary methods , 2018, Natural Computing.

[44]  Risto Miikkulainen,et al.  The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities , 2018, Artificial Life.

[45]  Pieter Abbeel,et al.  Evolved Policy Gradients , 2018, NeurIPS.

[46]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[47]  Dan Guo,et al.  Data-Driven Evolutionary Optimization: An Overview and Case Studies , 2019, IEEE Transactions on Evolutionary Computation.

[48]  Elliot Meyerson,et al.  Flavor-cyber-agriculture: Optimization of plant metabolites in an open-source control environment through surrogate modeling , 2018, bioRxiv.

[49]  Dario Amodei,et al.  Benchmarking Safe Exploration in Deep Reinforcement Learning , 2019 .

[50]  Runzhe Yang,et al.  A Generalized Algorithm for Multi-Objective RL and Policy Adaptation , 2019 .

[51]  Risto Miikkulainen,et al.  Designing neural networks through neuroevolution , 2019, Nat. Mach. Intell..

[52]  Kenneth O. Stanley and Jeff Clune and Joel Lehman and Rist Miikkulainen,et al.  Designing Neural Networks through Evolutionary Algorithms , 2019 .

[53]  Elliot Meyerson,et al.  Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel , 2019, ICLR.

[54]  Risto Miikkulainen,et al.  Ascend by Evolv: AI-Based Massively Multivariate Conversion Rate Optimization , 2020, AI Mag..

[55]  Risto Miikkulainen,et al.  Creative AI Through Evolutionary Computation: Principles and Examples , 2019, SN Computer Science.