On the Acceleration of the Sinkhorn and Greenkhorn Algorithms for Optimal Transport

We propose and analyze a novel approach to accelerate the Sinkhorn and Greenkhorn algorithms for solving the entropic regularized optimal transport (OT) problems. Focusing on the discrete setting where the probability distributions have at most n atoms, and letting ε ∈ (0, 1) denote the tolerance, we introduce accelerated algorithms that have complexity bounds of Õ ( n/ε ) . This improves on the known complexity bound of Õ ( n/ε ) for the Sinkhorn and Greenkhorn algorithms. We also present two hybrid algorithms that use the new accelerated algorithms to initialize the Sinkhorn and Greenkhorn algorithms, and we establish complexity bounds of Õ ( n/ε ) for these hybrid algorithms. We provide an extensive experimental comparison on both synthetic and real datasets to explore the relative advantages of the new algorithms.

[1]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[2]  Vahab S. Mirrokni,et al.  Accelerating Greedy Coordinate Descent Methods , 2018, ICML.

[3]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[4]  Alessandro Rudi,et al.  Approximating the Quadratic Transportation Metric in Near-Linear Time , 2018, ArXiv.

[5]  Jonah Sherman,et al.  Area-convexity, l∞ regularization, and undirected multicommodity flow , 2017, STOC.

[6]  Kent Quanrud,et al.  Approximating optimal transport with linear programs , 2018, SOSA.

[7]  Yin Tat Lee,et al.  Path Finding Methods for Linear Programming: Solving Linear Programs in Õ(vrank) Iterations and Faster Algorithms for Maximum Flow , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[8]  Michael I. Jordan,et al.  Probabilistic Multilevel Clustering via Composite Transportation Distance , 2018, AISTATS.

[9]  Gabriel Peyré,et al.  A Smoothed Dual Approach for Variational Wasserstein Problems , 2015, SIAM J. Imaging Sci..

[10]  Michael I. Jordan,et al.  On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms , 2019, ICML.

[11]  Aaron Sidford,et al.  Towards Optimal Running Times for Optimal Transport , 2018, ArXiv.

[12]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Lin Xiao,et al.  An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[14]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[15]  Vivien Seguy,et al.  Smooth and Sparse Optimal Transport , 2017, AISTATS.

[16]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[17]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[18]  Kevin Tian,et al.  A Direct Õ(1/ε) Iteration Parallel Algorithm for Optimal Transport , 2019, ArXiv.

[19]  Alessandro Rudi,et al.  Massively scalable Sinkhorn distances via the Nyström method , 2018, NeurIPS.

[20]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[21]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[22]  Sanjeev Khanna,et al.  Better and simpler error analysis of the Sinkhorn–Knopp algorithm for matrix scaling , 2018, Mathematical Programming.

[23]  X. Nguyen Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[24]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[25]  Bahman Kalantari,et al.  On the complexity of general matrix scaling and entropy minimization via the RAS algorithm , 2007, Math. Program..

[26]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[27]  C. Villani Optimal Transport: Old and New , 2008 .

[28]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[29]  L. Kantorovich On the Translocation of Masses , 2006 .

[30]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[31]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[32]  Michael I. Jordan,et al.  Accelerated Primal-Dual Coordinate Descent for Computational Optimal Transport , 2019, ArXiv.

[33]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[34]  Nathaniel Lahn,et al.  A Graph Theoretic Additive Approximation of Optimal Transport , 2019, NeurIPS.