Convergence of the Continuous Time Trajectories of Isotropic Evolution Strategies on Monotonic $\mathcal C^2$ -composite Functions

The Information-Geometric Optimization (IGO) has been introduced as a unified framework for stochastic search algorithms. Given a parametrized family of probability distributions on the search space, the IGO turns an arbitrary optimization problem on the search space into an optimization problem on the parameter space of the probability distribution family and defines a natural gradient ascent on this space. From the natural gradients defined over the entire parameter space we obtain continuous time trajectories which are the solutions of an ordinary differential equation (ODE). Via discretization, the IGO naturally defines an iterated gradient ascent algorithm. Depending on the chosen distribution family, the IGO recovers several known algorithms such as the pure rank-μ update CMA-ES. Consequently, the continuous time IGO-trajectory can be viewed as an idealization of the original algorithm. In this paper we study the continuous time trajectories of the IGO given the family of isotropic Gaussian distributions. These trajectories are a deterministic continuous time model of the underlying evolution strategy in the limit for population size to infinity and change rates to zero. On functions that are the composite of a monotone and a convex-quadratic function, we prove the global convergence of the solution of the ODE towards the global optimum. We extend this result to composites of monotone and twice continuously differentiable functions and prove local convergence towards local optima.

[1]  Nikolaus Hansen,et al.  A Derandomized Approach to Self-Adaptation of Evolution Strategies , 1994, Evolutionary Computation.

[2]  H. Schwefel,et al.  Analyzing (1; ) Evolution Strategy via Stochastic Approximation Methods , 1995 .

[3]  H. Schwefel,et al.  Establishing connections between evolutionary algorithms and stochastic approximation , 1995 .

[4]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[5]  H. Thorisson Coupling, stationarity, and regeneration , 2000 .

[6]  Zbigniew Michalewicz,et al.  Evolutionary Computation 2 , 2000 .

[7]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[8]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[9]  Anne Auger,et al.  Convergence results for the (1, lambda)-SA-ES using the theory of phi-irreducible Markov chains , 2005, Theor. Comput. Sci..

[10]  A. Auger Convergence results for the ( 1 , )-SA-ES using the theory of-irreducible Markov chains , 2005 .

[11]  Jens Jägersküpper,et al.  Probabilistic runtime analysis of (1 +, λ),ES using isotropic mutations , 2006, GECCO '06.

[12]  Jens Jägersküpper,et al.  How the (1+1) ES using isotropic mutations minimizes positive definite quadratic forms , 2006, Theor. Comput. Sci..

[13]  A. C. Brooms Stochastic Approximation and Recursive Algorithms with Applications, 2nd edn by H. J. Kushner and G. G. Yin , 2006 .

[14]  Jens Jägersküpper,et al.  Algorithmic analysis of a basic evolutionary algorithm for continuous optimization , 2007, Theor. Comput. Sci..

[15]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[16]  Tom Schaul,et al.  Exponential natural evolution strategies , 2010, GECCO '10.

[17]  Isao Ono,et al.  Theoretical Foundation for CMA-ES from Information Geometry Perspective , 2012, Algorithmica.

[18]  Matteo Matteucci,et al.  Towards the geometry of estimation of distribution algorithms based on the exponential family , 2011, FOGA '11.

[19]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[20]  P. Olver Nonlinear Systems , 2013 .