Discovering Evolution Strategies via Meta-Black-Box Optimization

Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies. While highly general, their learning dynamics are often times heuristic and inflexible - exactly the limitations that meta-learning can address. Hence, we propose to discover effective update rules for evolution strategies via meta-learning. Concretely, our approach employs a search strategy parametrized by a self-attention-based architecture, which guarantees the update rule is invariant to the ordering of the candidate solutions. We show that meta-evolving this system on a small set of representative low-dimensional analytic optimization problems is sufficient to discover new evolution strategies capable of generalizing to unseen optimization problems, population sizes and optimization horizons. Furthermore, the same learned evolution strategy can outperform established neuroevolution baselines on supervised and continuous control tasks. As additional contributions, we ablate the individual neural network components of our method; reverse engineer the learned strategy into an explicit heuristic form, which remains highly competitive; and show that it is possible to self-referentially train an evolution strategy from scratch, with the learned update rule used to drive the outer meta-learning loop.

[1]  R. Lange evosax: JAX-Based Evolution Strategies , 2022, GECCO Companion.

[2]  Chris Xiaoxuan Lu,et al.  Discovered Policy Optimisation , 2022, NeurIPS.

[3]  Ofir Nachum,et al.  PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Luke Metz,et al.  Practical tradeoffs between memory, compute, and performance in learned optimizers , 2022, CoLLAs.

[5]  Yingtao Tian,et al.  EvoJAX: hardware-accelerated neuroevolution , 2022, GECCO Companion.

[6]  F. Hutter,et al.  Automated Reinforcement Learning (AutoRL): A Survey and Open Problems , 2022, J. Artif. Intell. Res..

[7]  Junhyuk Oh,et al.  Introducing Symmetries to Black Box Meta Reinforcement Learning , 2021, AAAI.

[8]  Satinder Singh,et al.  Bootstrapped Meta-Learning , 2021, ICLR.

[9]  Henning Sprekeler,et al.  Learning not to learn: Nature versus nurture in silico , 2020, AAAI.

[10]  J. Schmidhuber,et al.  Self-Referential Meta Learning , 2022 .

[11]  Paul Vicol,et al.  Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies , 2021, ICML.

[12]  Samuel S. Schoenholz,et al.  Gradients are Not All You Need , 2021, ArXiv.

[13]  David Ha,et al.  The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning , 2021, NeurIPS.

[14]  Samuel S. Schoenholz,et al.  Learn2Hop: Learned Optimization on Rough Landscapes , 2021, ICML.

[15]  Olivier Bachem,et al.  Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation , 2021, NeurIPS Datasets and Benchmarks.

[16]  Christian Gagn'e,et al.  Meta Learning Black-Box Population-Based Optimizers , 2021, ArXiv.

[17]  Andrew Zisserman,et al.  Perceiver: General Perception with Iterative Attention , 2021, ICML.

[18]  Jascha Sohl-Dickstein,et al.  Training Learned Optimizers with Randomly Initialized Learned Optimizers , 2021, ArXiv.

[19]  J. Schmidhuber,et al.  Meta Learning Backpropagation And Improving It , 2020, NeurIPS.

[20]  Michael L. Waskom,et al.  Seaborn: Statistical Data Visualization , 2021, J. Open Source Softw..

[21]  Jascha Sohl-Dickstein,et al.  Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves , 2020, ArXiv.

[22]  Junhyuk Oh,et al.  Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.

[23]  Junhyuk Oh,et al.  Meta-Gradient Reinforcement Learning with an Objective Discovered Online , 2020, NeurIPS.

[24]  Jaime Fern'andez del R'io,et al.  Array programming with NumPy , 2020, Nature.

[25]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[26]  Junhyuk Oh,et al.  A Self-Tuning Actor-Critic Algorithm , 2020, NeurIPS.

[27]  Sjoerd van Steenkiste,et al.  Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2019, ICLR.

[28]  Marius Lindauer,et al.  Learning Step-Size Adaptation in CMA-ES , 2020, PPSN.

[29]  Lovekesh Vig,et al.  Meta-Learning for Black-box Optimization , 2019, ECML/PKDD.

[30]  Tian Tian,et al.  MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments , 2019 .

[31]  Krzysztof Choromanski,et al.  From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization , 2019, NeurIPS.

[32]  Jeremy Nixon,et al.  Understanding and correcting pathologies in the training of learned optimizers , 2018, ICML.

[33]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[34]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[35]  Yann Dauphin,et al.  MetaInit: Initializing learning by learning to initialize , 2019, NeurIPS.

[36]  David Silver,et al.  Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[37]  Yiannis Demiris,et al.  Quality and Diversity Optimization: A Unifying Modular Framework , 2017, IEEE Transactions on Evolutionary Computation.

[38]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[39]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[40]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[41]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[42]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[43]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[44]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[45]  Kenneth O. Stanley,et al.  Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.

[46]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[47]  Tom Schaul,et al.  High dimensions and heavy tails for natural evolution strategies , 2011, GECCO '11.

[48]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[49]  Dennis Weyland,et al.  A Rigorous Analysis of the Harmony Search Algorithm: How the Research Community can be Misled by a "Novel" Methodology , 2010, Int. J. Appl. Metaheuristic Comput..

[50]  Raymond Ros,et al.  Real-Parameter Black-Box Optimization Benchmarking 2009: Experimental Setup , 2009 .

[51]  Raymond Ros,et al.  A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity , 2008, PPSN.

[52]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[53]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[54]  Yoshua Bengio,et al.  On the Optimization of a Synaptic Learning Rule , 2007 .

[55]  Hans-Paul Schwefel,et al.  Evolution strategies – A comprehensive introduction , 2002, Natural Computing.

[56]  Kunihiko Fukushima,et al.  Cognitron: A self-organizing multilayered neural network , 1975, Biological Cybernetics.

[57]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[58]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[59]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[60]  H. Schwefel,et al.  Evolutionsstrategien für die numerische Optimierung , 1977 .

[61]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .