Direct Runge-Kutta Discretization Achieves Acceleration

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. When the function is smooth enough, we show that acceleration can be achieved by a stable discretization of this ODE using standard Runge-Kutta integrators. Specifically, we prove that under Lipschitz-gradient, convexity and order-$(s+2)$ differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of $\mathcal{O}({N^{-2\frac{s}{s+1}}})$, where $s$ is the order of the Runge-Kutta numerical integrator. Furthermore, we introduce a new local flatness condition on the objective, under which rates even faster than $\mathcal{O}(N^{-2})$ can be achieved with low-order integrators and only gradient information. Notably, this flatness condition is satisfied by several standard loss functions used in machine learning. We provide numerical experiments that verify the theoretical rates predicted by our results.

[1]  E. Hairer,et al.  Geometric Numerical Integration: Structure Preserving Algorithms for Ordinary Differential Equations , 2004 .

[2]  Maxim Raginsky,et al.  Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[3]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[4]  J. H. Verner,et al.  High-order explicit Runge-Kutta pairs with low stage order , 1996 .

[5]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[6]  Bin Hu,et al.  Dissipativity Theory for Nesterov's Accelerated Method , 2017, ICML.

[7]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[8]  A. Folkesson Analysis of numerical methods , 2011 .

[9]  F. Bach,et al.  Integration Methods and Accelerated Optimization Algorithms , 2017, 1702.06751.

[10]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[11]  Ronald E. Bruck Asymptotic convergence of nonlinear contraction semigroups in Hilbert space , 1975 .

[12]  Alexandre d'Aspremont,et al.  Regularized nonlinear acceleration , 2016, Mathematical Programming.

[13]  H. Attouch,et al.  THE HEAVY BALL WITH FRICTION METHOD, I. THE CONTINUOUS DYNAMICAL SYSTEM: GLOBAL EXPLORATION OF THE LOCAL MINIMA OF A REAL-VALUED FUNCTION BY ASYMPTOTIC ANALYSIS OF A DISSIPATIVE DYNAMICAL SYSTEM , 2000 .

[14]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[15]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[16]  S. Łojasiewicz Ensembles semi-analytiques , 1965 .

[17]  Michael I. Jordan,et al.  On Symplectic Optimization , 2018, 1802.03653.

[18]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[19]  Jelena Diakonikolas,et al.  The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods , 2017, SIAM J. Optim..

[20]  Mohit Singh,et al.  A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[21]  Alejandro Ribeiro,et al.  Analysis of Optimization Algorithms via Integral Quadratic Constraints: Nonstrongly Convex Problems , 2017, SIAM J. Optim..

[22]  H. Attouch,et al.  A Dynamical Approach to Convex Minimization Coupling Approximation with the Steepest Descent Method , 1996 .

[23]  Hedy Attouch,et al.  The Rate of Convergence of Nesterov's Accelerated Forward-Backward Method is Actually Faster Than 1/k2 , 2015, SIAM J. Optim..

[24]  M. Leok Variational Integrators , 2012 .

[25]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[26]  Felipe Alvarez,et al.  On the Minimizing Property of a Second Order Dissipative System in Hilbert Spaces , 2000, SIAM J. Control. Optim..