Diagonal Acceleration for Covariance Matrix Adaptation Evolution Strategies

We introduce an acceleration for covariance matrix adaptation evolution strategies (CMA-ES) by means of adaptive diagonal decoding (dd-CMA). This diagonal acceleration endows the default CMA-ES with the advantages of separable CMA-ES without inheriting its drawbacks. Technically, we introduce a diagonal matrix D that expresses coordinate-wise variances of the sampling distribution in DCD form. The diagonal matrix can learn a rescaling of the problem in the coordinates within a linear number of function evaluations. Diagonal decoding can also exploit separability of the problem, but, crucially, does not compromise the performance on nonseparable problems. The latter is accomplished by modulating the learning rate for the diagonal matrix based on the condition number of the underlying correlation matrix. dd-CMA-ES not only combines the advantages of default and separable CMA-ES, but may achieve overadditive speedup: it improves the performance, and even the scaling, of the better of default and separable CMA-ES on classes of nonseparable test functions that reflect, arguably, a landscape feature commonly observed in practice. The article makes two further secondary contributions: we introduce two different approaches to guarantee positive definiteness of the covariance matrix with active CMA, which is valuable in particular with large population size; we revise the default parameter setting in CMA-ES, proposing accelerated settings in particular for large dimension. All our contributions can be viewed as independent improvements of CMA-ES, yet they are also complementary and can be seamlessly combined. In numerical experiments with dd-CMA-ES up to dimension 5120, we observe remarkable improvements over the original covariance matrix adaptation on functions with coordinate-wise ill-conditioning. The improvement is observed also for large population sizes up to about dimension squared.

[1]  Hans-Georg Beyer,et al.  Toward a Theory of Evolution Strategies: On the Benefits of Sex the (/, ) Theory , 1995, Evolutionary Computation.

[2]  Anne Auger,et al.  Quality Gain Analysis of the Weighted Recombination Evolution Strategy on General Convex Quadratic Functions , 2016, FOGA '17.

[3]  Nikolaus Hansen,et al.  A restart CMA evolution strategy with increasing population size , 2005, 2005 IEEE Congress on Evolutionary Computation.

[4]  Anne Auger,et al.  Evolution Strategies , 2018, Handbook of Computational Intelligence.

[5]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[6]  Michèle Sebag,et al.  Alternative Restart Strategies for CMA-ES , 2012, PPSN.

[7]  Isao Ono,et al.  Bidirectional Relation between CMA Evolution Strategies and Natural Evolution Strategies , 2010, PPSN.

[8]  Dirk V. Arnold,et al.  Active covariance matrix adaptation for the (1+1)-CMA-ES , 2010, GECCO.

[9]  Raymond Ros,et al.  Benchmarking sep-CMA-ES on the BBOB-2009 function testbed , 2009, GECCO '09.

[10]  Anne Auger,et al.  Principled Design of Continuous Stochastic Search: From Theory to Practice , 2014, Theory and Principled Methods for the Design of Metaheuristics.

[11]  Ilya Loshchilov,et al.  LM-CMA: An Alternative to L-BFGS for Large-Scale Black Box Optimization , 2015, Evolutionary Computation.

[12]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[13]  Nikolaus Hansen,et al.  Invariance, Self-Adaptation and Correlated Mutations and Evolution Strategies , 2000, PPSN.

[14]  Nikolaus Hansen,et al.  Benchmarking a BI-population CMA-ES on the BBOB-2009 function testbed , 2009, GECCO '09.

[15]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[16]  K. Price Differential evolution vs. the functions of the 2/sup nd/ ICEO , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[17]  Nikolaus Hansen,et al.  Evaluating the CMA Evolution Strategy on Multimodal Test Functions , 2004, PPSN.

[18]  Dirk V. Arnold,et al.  Improving Evolution Strategies through Active Covariance Matrix Adaptation , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[19]  Youhei Akimoto,et al.  Projection-Based Restricted Covariance Matrix Adaptation for High Dimension , 2016, GECCO.

[20]  Yee Whye Teh,et al.  Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server , 2015, J. Mach. Learn. Res..

[21]  Dirk V. Arnold,et al.  Weighted multirecombination evolution strategies , 2006, Theor. Comput. Sci..

[22]  Isao Ono,et al.  Functionally specialized CMA-ES: a modification of CMA-ES based on the specialization of the functions of covariance matrix adaptation and step size adaptation , 2008, GECCO '08.

[23]  Fernando G. Lobo,et al.  A parameter-less genetic algorithm , 1999, GECCO.

[24]  Tom Schaul,et al.  Exponential natural evolution strategies , 2010, GECCO '10.

[25]  Anne Auger,et al.  Performance evaluation of an advanced local search evolutionary algorithm , 2005, 2005 IEEE Congress on Evolutionary Computation.

[26]  R. Heijmans When does the expectation of a ratio equal the ratio of expectations? , 1999 .

[27]  Raymond Ros,et al.  A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity , 2008, PPSN.

[28]  Mark Hoogendoorn,et al.  Parameter Control in Evolutionary Algorithms: Trends and Challenges , 2015, IEEE Transactions on Evolutionary Computation.

[29]  James N. Knight,et al.  Reducing the space-time complexity of the CMA-ES , 2007, GECCO '07.

[30]  Nikolaus Hansen,et al.  A Derandomized Approach to Self-Adaptation of Evolution Strategies , 1994, Evolutionary Computation.

[31]  Nikolaos V. Sahinidis,et al.  Derivative-free optimization: a review of algorithms and comparison of software implementations , 2013, J. Glob. Optim..

[32]  Anne Auger,et al.  Impacts of invariance in search: When CMA-ES and PSO face ill-conditioned and non-separable problems , 2011, Appl. Soft Comput..

[33]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[34]  Oswin Krause,et al.  A CMA-ES with Multiplicative Covariance Matrix Updates , 2015, GECCO.

[35]  Xin Yao,et al.  Fast Evolution Strategies , 1997, Evolutionary Programming.

[36]  Anne Auger,et al.  Comparison-based natural gradient optimization in high dimension , 2014, GECCO.

[37]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[38]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[39]  Oswin Krause,et al.  CMA-ES with Optimal Covariance Update and Storage Complexity , 2016, NIPS.