论文信息 - On the Efficiency of the Sinkhorn and Greenkhorn Algorithms and Their Acceleration for Optimal Transport

On the Efficiency of the Sinkhorn and Greenkhorn Algorithms and Their Acceleration for Optimal Transport

We present new complexity results for several algorithms that approximately solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. First, we show that a greedy variant of the classical Sinkhorn algorithm, known as the \textit{Greenkhorn} algorithm, achieves the complexity bound of $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$, which improves the best known bound $\widetilde{\mathcal{O}}(n^2\varepsilon^{-3})$. Notably, this matches the best known complexity bound of the Sinkhorn algorithm and explains the superior performance of the Greenkhorn algorithm in practice. Furthermore, we generalize an adaptive primal-dual accelerated gradient descent (APDAGD) algorithm with mirror mapping $\phi$ and show that the resulting \textit{adaptive primal-dual accelerated mirror descent} (APDAMD) algorithm achieves the complexity bound of $\widetilde{\mathcal{O}}(n^2\sqrt{\delta}\varepsilon^{-1})$ where $\delta>0$ depends on $\phi$. We point out that an existing complexity bound for the APDAGD algorithm is not valid in general using a simple counterexample and then establish the complexity bound of $\widetilde{\mathcal{O}}(n^{5/2}\varepsilon^{-1})$ by exploiting the connection between the APDAMD and APDAGD algorithms. Moreover, we introduce accelerated Sinkhorn and Greenkhorn algorithms that achieve the complexity bound of $\widetilde{\mathcal{O}}(n^{7/3}\varepsilon^{-1})$, which improves on the complexity bounds $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$ of Sinkhorn and Greenkhorn algorithms in terms of $\varepsilon$. Experimental results on synthetic and real datasets demonstrate the favorable performance of new algorithms in practice.

Michael I. Jordan | Tianyi Lin | Nhat Ho

[1] Avi Wigderson,et al. Much Faster Algorithms for Matrix Scaling , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[2] Aleksander Madry,et al. Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[3] Xin Guo,et al. Sparsemax and Relaxed Wasserstein for Topic Sparsity , 2018, WSDM.

[4] Bahman Kalantari,et al. On the complexity of general matrix scaling and entropy minimization via the RAS algorithm , 2007, Math. Program..

[5] Darina Dvinskikh,et al. Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[6] Marco Cuturi,et al. Subspace Robust Wasserstein distances , 2019, ICML.

[7] Aaron Sidford,et al. Towards Optimal Running Times for Optimal Transport , 2018, ArXiv.

[8] David B. Dunson,et al. Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[9] Richard Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[10] Jason Altschuler,et al. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[11] Amir Beck,et al. On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..