论文信息 - From Local SGD to Local Fixed Point Methods for Federated Learning - 字舞流文

From Local SGD to Local Fixed Point Methods for Federated Learning

Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this context, each local operator models the computations done locally on a mobile device. We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations. In both cases, the goal is to limit communication of the locally-computed variables, which is often the bottleneck in distributed frameworks. We perform convergence analysis of both methods and conduct a number of experiments highlighting the benefits of our approach.

Laurent Condat | Dmitry Kovalev | Peter Richtarik | Grigory Malinovsky | Elnur Gasanov

[1] P. L. Combettes,et al. Compositions and convex combinations of averaged nonexpansive operators , 2014, 1407.5100.

[2] Heinz H. Bauschke,et al. Fixed-Point Algorithms for Inverse Problems in Science and Engineering , 2011, Springer Optimization and Its Applications.

[3] Farzin Haddadpour,et al. On the Convergence of Local Descent Methods in Federated Learning , 2019, ArXiv.

[4] Peter Richtárik,et al. Gradient Descent with Compressed Iterates , 2019, ArXiv.

[5] Michael I. Jordan,et al. Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..

[6] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[7] Peter Richtárik,et al. First Analysis of Local GD on Heterogeneous Data , 2019, ArXiv.

[8] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[9] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[10] Benjamin Recht,et al. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[11] J. Pesquet,et al. A Class of Randomized Primal-Dual Algorithms for Distributed Optimization , 2014, 1406.6404.

[12] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[13] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[14] Konstantin Mishchenko,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.

[15] Dmitry Kovalev,et al. Distributed Fixed Point Methods with Compressed Iterates , 2019, ArXiv.

[16] Yaoliang Yu,et al. On Decomposing the Proximal Map , 2013, NIPS.

[17] Damek Davis,et al. Convergence Rate Analysis of Several Splitting Schemes , 2014, 1406.4834.

[18] Patrick L. Combettes,et al. A Fixed Point Framework for Recovering Signals from Nonlinear Transformations , 2021, 2020 28th European Signal Processing Conference (EUSIPCO).

[19] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.