Three Operator Splitting with a Nonconvex Loss Function

We consider the problem of minimizing the sum of three functions, one of which is nonconvex but differentiable, and the other two are convex but possibly nondifferentiable. We investigate the Three Operator Splitting method (TOS) of Davis & Yin (2017) with an aim to extend its theoretical guarantees for this nonconvex problem template. In particular, we prove convergence of TOS with nonasymptotic bounds on its nonstationarity and infeasibility errors. In contrast with the existing work on nonconvex TOS, our guarantees do not require additional smoothness assumptions on the terms comprising the objective; hence they cover instances of particular interest where the nondifferentiable terms are indicator functions. We also extend our results to a stochastic setting where we have access only to an unbiased estimator of the gradient. Finally, we illustrate the effectiveness of the proposed method through numerical experiments on quadratic assignment problems.

[1]  A. Volgenant,et al.  A shortest augmenting path algorithm for dense and sparse linear assignment problems , 1987, Computing.

[2]  Carey E. Priebe,et al.  Fast Approximate Quadratic Programming for Graph Matching , 2015, PloS one.

[3]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[4]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[5]  Gerhard J. Woeginger,et al.  The Traveling Salesman Problem Under Squared Euclidean Distances , 2010, STACS.

[6]  L. Briceño-Arias Forward-Douglas–Rachford splitting and forward-partial inverse method for solving monotone inclusions , 2012, 1212.5942.

[7]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[8]  Yura Malitsky,et al.  Golden ratio algorithms for variational inequalities , 2018, Mathematical Programming.

[9]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[10]  Volkan Cevher,et al.  A totally unimodular view of structured sparsity , 2014, AISTATS.

[11]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[12]  Bingsheng He,et al.  On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers , 2014, Numerische Mathematik.

[13]  Fabian Pedregosa,et al.  Proximal Splitting Meets Variance Reduction , 2018, AISTATS.

[14]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[15]  Esa Ollila,et al.  Regularized $M$ -Estimators of Scatter Matrix , 2014, IEEE Transactions on Signal Processing.

[16]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[17]  Yao Lu,et al.  A fast projected fixed-point algorithm for large graph matching , 2012, Pattern Recognit..

[18]  Mohamed-Jalal Fadili,et al.  A Generalized Forward-Backward Splitting , 2011, SIAM J. Imaging Sci..

[19]  A. Hoffmann The Distance to the Intersection of Two Convex Sets Expressed by the Distances to Each of Them , 1992 .

[20]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[21]  Francis R. Bach,et al.  Spectral Norm Regularization of Orthonormal Representations for Graph Transduction , 2015, NIPS.

[22]  Laurent Condat Fast projection onto the simplex and the l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pmb {l}_\mathbf {1}$$\end{ , 2015, Mathematical Programming.

[23]  Panagiotis Patrinos,et al.  Forward-Backward Envelope for the Sum of Two Nonconvex Functions: Further Properties and Nonmonotone Linesearch Algorithms , 2016, SIAM J. Optim..

[24]  Chiranjib Bhattacharyya,et al.  Convex Optimization over Intersection of Simple Sets: improved Convergence Rate Guarantees via an Exact Penalty Approach , 2017, AISTATS.

[25]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[26]  Damek Davis,et al.  A Three-Operator Splitting Scheme and its Optimization Applications , 2015, 1504.01032.

[27]  Volkan Cevher,et al.  Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator , 2019, ICML.

[28]  Nicholas J. Higham,et al.  Anderson acceleration of the alternating projections method for computing the nearest correlation matrix , 2016, Numerical Algorithms.

[29]  Amnon Shashua,et al.  Doubly Stochastic Normalization for Spectral Clustering , 2006, NIPS.

[30]  Wotao Yin,et al.  An Envelope for Davis–Yin Splitting and Strict Saddle-Point Avoidance , 2018, J. Optim. Theory Appl..

[31]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[32]  Alexander J. Smola,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[33]  Volkan Cevher,et al.  Stochastic Three-Composite Convex Minimization , 2017, NIPS.

[34]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[35]  M. Zaslavskiy,et al.  A Path Following Algorithm for the Graph Matching Problem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Volkan Cevher,et al.  Stochastic Three-Composite Convex Minimization with a Linear Operator , 2018, AISTATS.

[37]  Franz Rendl,et al.  QAPLIB – A Quadratic Assignment Problem Library , 1997, J. Glob. Optim..

[38]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[39]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[40]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[41]  Gauthier Gidel,et al.  Adaptive Three Operator Splitting , 2018, ICML.

[42]  Volkan Cevher,et al.  A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming , 2018, ICML.

[43]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[44]  Nicolas Vayatis,et al.  Estimation of Simultaneously Sparse and Low Rank Matrices , 2012, ICML.

[45]  Xiaoqun Zhang,et al.  A Three-Operator Splitting Algorithm for Nonconvex Sparsity Regularization , 2020, SIAM J. Sci. Comput..

[46]  Yeol Je Cho,et al.  Convergence Analysis of an Inexact Three-Operator Splitting Algorithm , 2018, Symmetry.

[47]  Zhengyuan Zhou,et al.  Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities , 2021, NeurIPS.

[48]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[49]  Amnon Shashua,et al.  Nonnegative Sparse PCA , 2006, NIPS.

[50]  Alberto Bemporad,et al.  Douglas-rachford splitting: Complexity estimates and accelerated variants , 2014, 53rd IEEE Conference on Decision and Control.

[51]  Abdel Nasser,et al.  A Survey of the Quadratic Assignment Problem , 2014 .

[52]  T. Koopmans,et al.  Assignment Problems and the Location of Economic Activities , 1957 .

[53]  Teofilo F. Gonzalez,et al.  P-Complete Approximation Problems , 1976, J. ACM.

[54]  Suvrit Sra,et al.  Modular Proximal Optimization for Multidimensional Total-Variation Regularization , 2014, J. Mach. Learn. Res..