论文信息 - Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives - 字舞流文

Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives

We analyze the convergence rate of various momentum-based optimization algorithms from a dynamical systems point of view. Our analysis exploits fundamental topological properties, such as the continuous dependence of iterates on their initial conditions, to provide a simple characterization of convergence rates. In many cases, closed-form expressions are obtained that relate algorithm parameters to the convergence rate. The analysis encompasses discrete time and continuous time, as well as time-invariant and time-variant formulations, and is not limited to a convex or Euclidean setting. In addition, the article rigorously establishes why symplectic discretization schemes are important for momentum-based optimization algorithms, and provides a characterization of algorithms that exhibit accelerated convergence.

Michael I. Jordan | Michael Muehlebach

[1] Yee Whye Teh,et al. Hamiltonian Descent Methods , 2018, ArXiv.

[2] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.

[3] Jelena Diakonikolas,et al. The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods , 2017, SIAM J. Optim..

[4] Alexandre M. Bayen,et al. Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[5] S. Elaydi. An introduction to difference equations , 1995 .

[6] C. Ebenbauer,et al. On a Class of Smooth Optimization Algorithms with Applications in Control , 2012 .

[7] Benjamin Recht,et al. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[8] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[9] W. Rudin. Principles of mathematical analysis , 1964 .

[10] Yin Tat Lee,et al. Near-optimal method for highly smooth convex optimization , 2018, COLT.

[11] Roscoe B White,et al. Asymptotic Analysis Of Differential Equations , 2005 .

[12] Mohit Singh,et al. A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[13] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[14] Michael I. Jordan,et al. A Dynamical Systems Perspective on Nesterov Acceleration , 2019, ICML.

[15] Michael I. Jordan,et al. On Symplectic Optimization , 2018, 1802.03653.

[16] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[17] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.

[18] J. Hale. Asymptotic Behavior of Dissipative Systems , 1988 .

[19] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[20] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[21] R. Bellman. Stability theory of differential equations , 1953 .

[22] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[23] E. Hairer,et al. Geometric Numerical Integration , 2022, Oberwolfach Reports.

[24] O. Nelles,et al. An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[25] Simon Michalowsky,et al. Robust and structure exploiting optimisation algorithms: an integral quadratic constraint approach , 2019, Int. J. Control.

[26] P. Olver. Nonlinear Systems , 2013 .

[27] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[28] V. Arnold,et al. Ordinary Differential Equations , 1973 .

[29] J. M. Sanz-Serna,et al. Symplectic integrators for Hamiltonian problems: an overview , 1992, Acta Numerica.

[30] Michael I. Jordan,et al. On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points , 2019 .

[31] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .

[32] Daniel P. Robinson,et al. Conformal symplectic and relativistic optimization , 2019, NeurIPS.

[33] Juan Peypouquet,et al. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity , 2018, Math. Program..

[34] Alexandre d'Aspremont,et al. Integration Methods and Optimization Algorithms , 2017, NIPS.

[35] Michael I. Jordan,et al. Stochastic Gradient Descent Escapes Saddle Points Efficiently , 2019, ArXiv.

[36] S. Gadat,et al. Stochastic Heavy ball , 2016, 1609.04228.

[37] Ravi P. Agarwal,et al. Difference equations and inequalities , 1992 .

[38] Daniel P. Robinson,et al. A trust region algorithm with a worst-case iteration complexity of $$\mathcal{O}(\epsilon ^{-3/2})$$O(ϵ-3/2) for nonconvex optimization , 2017, Math. Program..

[39] L. Einkemmer. Structure preserving numerical methods for the Vlasov equation , 2016, 1604.02616.

[40] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.