论文信息 - On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions - 字舞流文

On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an $\epsilon$-stationary point with first-order methods is impossible in finite time. We then introduce the notion of $(\delta, \epsilon)$-stationarity, which allows for an $\epsilon$-approximate gradient to be the convex combination of generalized gradients evaluated at points within distance $\delta$ to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a $(\delta, \epsilon)$-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on $\delta$. Empirically, our methods perform well for training ReLU neural networks.

Suvrit Sra | Ali Jadbabaie | Jingzhao Zhang | Hongzhou Lin

[1] Niao He,et al. On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization , 2018, 1806.04781.

[2] Josef Hofbauer,et al. Stochastic Approximations and Differential Inclusions , 2005, SIAM J. Control. Optim..

[3] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[4] Zhouchen Lin,et al. Sharp Analysis for Nonconvex SGD Escaping from Saddle Points , 2019, COLT.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Marten van Dijk,et al. Finite-sum smooth optimization with SARAH , 2019, Computational Optimization and Applications.

[7] A. A. Goldstein,et al. Optimization of lipschitz continuous functions , 1977, Math. Program..

[8] Yair Carmon,et al. Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[9] Dmitriy Drusvyatskiy,et al. Stochastic Subgradient Method Converges on Tame Functions , 2018, Foundations of Computational Mathematics.

[10] Quanquan Gu,et al. Stochastic Nested Variance Reduction for Nonconvex Optimization , 2018, J. Mach. Learn. Res..

[11] Nadav Hallak,et al. On the Convergence to Stationary Points of Deterministic and Randomized Feasible Descent Directions Methods , 2020, SIAM J. Optim..

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[14] Ohad Shamir,et al. The Complexity of Finding Stationary Points with Stochastic Gradient Descent , 2020, ICML.

[15] Marc Teboulle,et al. First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems , 2017, SIAM J. Optim..

[16] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[17] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[18] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[19] Dmitriy Drusvyatskiy,et al. Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[20] Marten van Dijk,et al. Optimal Finite-Sum Smooth Non-Convex Optimization with SARAH , 2019, ArXiv.

[21] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[22] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.

[23] Robert Mifflin,et al. An Algorithm for Constrained Optimization with Semismooth Functions , 1977, Math. Oper. Res..

[24] O. Smolyanov,et al. The theory of differentiation in linear topological spaces , 1967 .

[25] F. Clarke. Optimization And Nonsmooth Analysis , 1983 .

[26] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.

[27] Adrian S. Lewis,et al. Approximating Subdifferentials by Random Sampling of Gradients , 2002, Math. Oper. Res..

[28] Thomas Hofmann,et al. Escaping Saddles with Stochastic Gradients , 2018, ICML.

[29] Edouard Pauwels,et al. Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning , 2019, ArXiv.

[30] Nathan Srebro,et al. Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.

[31] Michael L. Overton,et al. Gradient Sampling Methods for Nonsmooth Optimization , 2018, Numerical Nonsmooth Optimization.

[32] O. Mangasarian. On Concepts of Directional Differentiability , 2004 .

[33] Ohad Shamir,et al. The Complexity of Making the Gradient Small in Stochastic Convex Optimization , 2019, COLT.

[34] É. Moulines,et al. Analysis of nonsmooth stochastic approximation: the differential inclusion approach , 2018, 1805.01916.

[35] Feng Ruan,et al. Stochastic Methods for Composite and Weakly Convex Optimization Problems , 2017, SIAM J. Optim..

[36] Lamberto Cesari,et al. Optimization-Theory And Applications , 1983 .

[37] Ohad Shamir,et al. Can We Find Near-Approximately-Stationary Points of Nonsmooth Nonconvex Functions? , 2020, ArXiv.

[38] Zeyuan Allen-Zhu,et al. How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.

[39] Dmitriy Drusvyatskiy,et al. Efficiency of minimizing compositions of convex functions and smooth maps , 2016, Math. Program..

[40] M. Coste. AN INTRODUCTION TO O-MINIMAL GEOMETRY , 2002 .

[41] Krzysztof C. Kiwiel,et al. Convergence of the Gradient Sampling Algorithm for Nonsmooth Nonconvex Optimization , 2007, SIAM J. Optim..

[42] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[43] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.