From Local SGD to Local Fixed Point Methods for Federated Learning

Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this context, each local operator models the computations done locally on a mobile device. We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations. In both cases, the goal is to limit communication of the locally-computed variables, which is often the bottleneck in distributed frameworks. We perform convergence analysis of both methods and conduct a number of experiments highlighting the benefits of our approach.

[1]  P. L. Combettes,et al.  Compositions and convex combinations of averaged nonexpansive operators , 2014, 1407.5100.

[2]  Heinz H. Bauschke,et al.  Fixed-Point Algorithms for Inverse Problems in Science and Engineering , 2011, Springer Optimization and Its Applications.

[3]  Farzin Haddadpour,et al.  On the Convergence of Local Descent Methods in Federated Learning , 2019, ArXiv.

[4]  Peter Richtárik,et al.  Gradient Descent with Compressed Iterates , 2019, ArXiv.

[5]  Michael I. Jordan,et al.  Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..

[6]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[7]  Peter Richtárik,et al.  First Analysis of Local GD on Heterogeneous Data , 2019, ArXiv.

[8]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[9]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[10]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[11]  J. Pesquet,et al.  A Class of Randomized Primal-Dual Algorithms for Distributed Optimization , 2014, 1406.6404.

[12]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[13]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[14]  Konstantin Mishchenko,et al.  Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.

[15]  Dmitry Kovalev,et al.  Distributed Fixed Point Methods with Compressed Iterates , 2019, ArXiv.

[16]  Yaoliang Yu,et al.  On Decomposing the Proximal Map , 2013, NIPS.

[17]  Damek Davis,et al.  Convergence Rate Analysis of Several Splitting Schemes , 2014, 1406.4834.

[18]  Patrick L. Combettes,et al.  A Fixed Point Framework for Recovering Signals from Nonlinear Transformations , 2021, 2020 28th European Signal Processing Conference (EUSIPCO).

[19]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.