Adaptive Averaging in Accelerated Descent Dynamics

We study accelerated descent dynamics for constrained convex optimization. This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate $\eta(t)$, and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights $w(t)$. Using a Lyapunov argument, we give sufficient conditions on $\eta$ and $w$ to achieve a desired convergence rate. As an example, we show that the replicator dynamics (an example of mirror descent on the simplex) can be accelerated using a simple averaging scheme. We then propose an adaptive averaging heuristic which adaptively computes the weights to speed up the decrease of the Lyapunov function. We provide guarantees on adaptive averaging in continuous-time, prove that it preserves the quadratic convergence rate of accelerated first-order methods in discrete-time, and give numerical experiments to compare it with existing heuristics, such as adaptive restarting. The experiments indicate that adaptive averaging performs at least as well as adaptive restarting, with significant improvements in some cases.

[1]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[2]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[3]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[4]  M. Bartholomew-Biggs,et al.  Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations , 1989 .

[5]  Jean-Pierre Aubin,et al.  Viability theory , 1991 .

[6]  A. M. Lyapunov The general problem of the stability of motion , 1992 .

[7]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[8]  A. Bloch Hamiltonian and Gradient Flows, Algorithms and Control , 1995 .

[9]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[10]  Yurii Nesterov,et al.  Accelerating the cubic regularization of Newton’s method on convex problems , 2005, Math. Program..

[11]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[12]  Francis R. Bach,et al.  From Averaging to Acceleration, There is Only a Step-size , 2015, COLT.

[13]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[14]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  H. Attouch,et al.  Fast Convergence of an Inertial Gradient-like System with Vanishing Viscosity , 2015, 1507.04782.

[17]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[18]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[19]  Hedy Attouch,et al.  The Rate of Convergence of Nesterov's Accelerated Forward-Backward Method is Actually Faster Than 1/k2 , 2015, SIAM J. Optim..

[20]  H. Attouch,et al.  Fast convex optimization via inertial dynamics with Hessian driven damping , 2016, Journal of Differential Equations.

[21]  Maximilian Bayer Complexity Language And Life Mathematical Approaches , 2016 .

[22]  D. M. V. Hesteren Evolutionary Game Theory , 2017 .