Continuous-time Lower Bounds for Gradient-based Algorithms

This article derives lower bounds on the convergence rate of continuous-time gradient-based optimization algorithms. The algorithms are subjected to a time-normalization constraint that avoids a reparametrization of time in order to make the discussion of continuous-time convergence rates meaningful. We reduce the multi-dimensional problem to a single dimension, recover well-known lower bounds from the discrete-time setting, and provide insights into why these lower bounds occur. We further explicitly provide algorithms that achieve the proposed lower bounds, even when the function class under consideration includes certain non-convex functions.

[1]  Yair Carmon,et al.  Lower bounds for finding stationary points II: first-order methods , 2017, Mathematical Programming.

[2]  C. Desoer,et al.  An elementary proof of Kharitonov's stability theorem with extensions , 1989 .

[3]  Aarne H. Sipilä,et al.  A nonexistence theorem for explicit $A$-stable methods , 1974 .

[4]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[5]  Roy M. Howard,et al.  Linear System Theory , 1992 .

[6]  W. Rugh Linear System Theory , 1992 .

[7]  Michael I. Jordan,et al.  Stochastic Gradient Descent Escapes Saddle Points Efficiently , 2019, ArXiv.

[8]  Randy A. Freeman,et al.  The Fastest Known Globally Convergent First-Order Method for Minimizing Strongly Convex Functions , 2018, IEEE Control Systems Letters.

[9]  A. Fuller,et al.  Stability of Motion , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Michael I. Jordan,et al.  A Dynamical Systems Perspective on Nesterov Acceleration , 2019, ICML.

[11]  F. Krogh,et al.  Solving Ordinary Differential Equations , 2019, Programming for Computations - Python.

[12]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[13]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[14]  Michael I. Jordan,et al.  On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points , 2019 .

[15]  J. Butcher Numerical methods for ordinary differential equations , 2003 .

[16]  Ohad Shamir,et al.  On Lower and Upper Bounds in Smooth and Strongly Convex Optimization , 2016, J. Mach. Learn. Res..

[17]  E. Hairer,et al.  Solving Ordinary Differential Equations II , 2010 .

[18]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[19]  P. Hartman Ordinary Differential Equations , 1965 .

[20]  R. Bellman Stability theory of differential equations , 1953 .

[21]  T. Fujii,et al.  On positive real lemma for non-minimal realization systems , 2008 .

[22]  Michael I. Jordan,et al.  Generalized Momentum-Based Methods: A Hamiltonian Perspective , 2019, SIAM J. Optim..