On the Complexity of Approximating Multimarginal Optimal Transport

We study the complexity of approximating the multimarginal optimal transport (MOT) distance, a generalization of the classical optimal transport distance, considered here between $m$ discrete probability distributions supported each on $n$ support points. First, we show that the standard linear programming (LP) representation of the MOT problem is not a minimum-cost flow problem when $m \geq 3$. This negative result implies that some combinatorial algorithms, e.g., network simplex method, are not suitable for approximating the MOT problem, while the worst-case complexity bound for the deterministic interior-point algorithm remains a quantity of $\tilde{O}(n^{3m})$. We then propose two simple and \textit{deterministic} algorithms for approximating the MOT problem. The first algorithm, which we refer to as \textit{multimarginal Sinkhorn} algorithm, is a provably efficient multimarginal generalization of the Sinkhorn algorithm. We show that it achieves a complexity bound of $\tilde{O}(m^3n^m\varepsilon^{-2})$ for a tolerance $\varepsilon \in (0, 1)$. This provides a first \textit{near-linear time} complexity bound guarantee for approximating the MOT problem and matches the best known complexity bound for the Sinkhorn algorithm in the classical OT setting when $m = 2$. The second algorithm, which we refer to as \textit{accelerated multimarginal Sinkhorn} algorithm, achieves the acceleration by incorporating an estimate sequence and the complexity bound is $\tilde{O}(m^3n^{m+1/3}\varepsilon^{-4/3})$. This bound is better than that of the first algorithm in terms of $1/\varepsilon$, and accelerated alternating minimization algorithm~\citep{Tupitsa-2020-Multimarginal} in terms of $n$. Finally, we compare our new algorithms with the commercial LP solver \textsc{Gurobi}. Preliminary results on synthetic data and real images demonstrate the effectiveness and efficiency of our algorithms.

[1]  Kent Quanrud,et al.  Approximating optimal transport with linear programs , 2018, SOSA.

[2]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[3]  Aaron Sidford,et al.  Towards Optimal Running Times for Optimal Transport , 2018, ArXiv.

[4]  C. Villani Topics in Optimal Transportation , 2003 .

[5]  Andrew V. Goldberg,et al.  Beyond the flow decomposition barrier , 1998, JACM.

[6]  L. Kantorovich On the Translocation of Masses , 2006 .

[7]  Knut-Andreas Lie,et al.  Scale Space and Variational Methods in Computer Vision , 2019, Lecture Notes in Computer Science.

[8]  Jonathan Weed,et al.  Statistical bounds for entropic optimal transport: sample complexity and the central limit theorem , 2019, NeurIPS.

[9]  Yin Tat Lee,et al.  Path Finding Methods for Linear Programming: Solving Linear Programs in Õ(vrank) Iterations and Faster Algorithms for Maximum Flow , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[10]  Daniel A. Spielman,et al.  Faster approximate lossy generalized flow via interior point algorithms , 2008, STOC.

[11]  Justin Solomon,et al.  Parallel Streaming Wasserstein Barycenters , 2017, NIPS.

[12]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[13]  Michael I. Jordan,et al.  On the Efficiency of the Sinkhorn and Greenkhorn Algorithms and Their Acceleration for Optimal Transport , 2019 .

[14]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[15]  Brendan Pass Multi-marginal optimal transport: theory and applications , 2014, 1406.0026.

[16]  Nathaniel Lahn,et al.  A Graph Theoretic Additive Approximation of Optimal Transport , 2019, NeurIPS.

[17]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[18]  Kevin Tian,et al.  A Direct Õ(1/ε) Iteration Parallel Algorithm for Optimal Transport , 2019, ArXiv.

[19]  Éva Tardos,et al.  A strongly polynomial minimum cost circulation algorithm , 1985, Comb..

[20]  S. Guminov,et al.  Accelerated Alternating Minimization, Accelerated Sinkhorn's Algorithm and Accelerated Iterative Bregman Projections. , 2019 .

[21]  Michael I. Jordan,et al.  On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms , 2019, ICML.

[22]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[23]  A. Galichon,et al.  A stochastic control approach to no-arbitrage bounds given marginals, with an application to lookback options , 2014, 1401.3921.

[24]  Y. Brenier Generalized solutions and hydrostatic approximation of the Euler equations , 2008 .

[25]  Shiguang Shan,et al.  AttGAN: Facial Attribute Editing by Only Changing What You Want , 2017, IEEE Transactions on Image Processing.

[26]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[27]  Mingkui Tan,et al.  Multi-marginal Wasserstein GAN , 2019, NeurIPS.

[28]  Codina Cotar,et al.  Density Functional Theory and Optimal Transportation with Coulomb Cost , 2011, 1104.0603.

[29]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Kevin Tian,et al.  A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport , 2019, NeurIPS.

[31]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[32]  Zeyuan Allen Zhu,et al.  Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling , 2015, ICML.

[33]  Refael Hassin,et al.  The minimum cost flow problem: A unifying approach to dual algorithms and a new tree-search algorithm , 1983, Math. Program..

[34]  César A. Uribe,et al.  Multimarginal Optimal Transport by Accelerated Gradient Descent , 2020 .

[35]  Michael I. Jordan,et al.  Revisiting Fixed Support Wasserstein Barycenter: Computational Hardness and Efficient Algorithms , 2020, ArXiv.

[36]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[37]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[38]  S. Thomas McCormick,et al.  Canceling most helpful total cuts for minimum cost network flow , 1993, Networks.

[39]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[40]  Julien Rabin,et al.  Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[41]  Lin Xiao,et al.  An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[42]  Darina Dvinskikh,et al.  On the Complexity of Approximating Wasserstein Barycenters , 2019, ICML.

[43]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[44]  M. Klein A Primal Method for Minimal Cost Flows with Applications to the Assignment and Transportation Problems , 1966 .

[45]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[46]  Mark W. Schmidt,et al.  Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[47]  Tommi S. Jaakkola,et al.  Convergence Rate Analysis of MAP Coordinate Minimization Algorithms , 2012, NIPS.

[48]  P. Chiappori,et al.  Hedonic price equilibria, stable matching, and optimal transport: equivalence, topology, and uniqueness , 2007 .

[49]  G. Carlier,et al.  Matching for teams , 2010 .

[50]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[51]  Pradeep Ravikumar,et al.  Nearest Neighbor based Greedy Coordinate Descent , 2011, NIPS.

[52]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[53]  G. Buttazzo,et al.  Optimal-transport formulation of electronic density-functional theory , 2012, 1205.4514.

[54]  Vahab S. Mirrokni,et al.  Accelerating Greedy Coordinate Descent Methods , 2018, ICML.

[55]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[56]  Andrew V. Goldberg,et al.  Finding minimum-cost circulations by canceling negative cycles , 1989, JACM.

[57]  S. Thomas McCormick,et al.  Two Strongly Polynomial Cut Cancelling Algorithms for Minimum Cost Network Flow , 1993, Discret. Appl. Math..

[58]  Refael Hassin Algorithms for the minimum cost circulation problem based on maximizing the mean improvement , 1992, Oper. Res. Lett..

[59]  James B. Orlin,et al.  A polynomial time primal network simplex algorithm for minimum cost flows , 1996, SODA '96.

[60]  Y. Brenier Minimal geodesics on groups of volume-preserving maps and generalized solutions of the Euler equations , 1999 .

[61]  Bahman Kalantari,et al.  On the complexity of general matrix scaling and entropy minimization via the RAS algorithm , 2007, Math. Program..

[62]  Nizar Touzi,et al.  A Stochastic Control Approach to No-Arbitrage Bounds Given Marginals, with an Application to Lookback Options , 2013, 1401.3921.

[63]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[64]  Steffen Borgwardt,et al.  Discrete Wasserstein barycenters: optimal transport for discrete data , 2015, Math. Methods Oper. Res..

[65]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[66]  Éva Tardos,et al.  An O(n2(m + Nlog n)log n) min-cost flow algorithm , 1988, JACM.

[67]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[68]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[69]  P. Gori-Giorgi,et al.  Strictly correlated electrons in density-functional theory: A general formulation with applications to spherical densities , 2007, cond-mat/0701025.

[70]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[71]  Claude Berge,et al.  The Theory Of Graphs , 1962 .

[72]  Michael I. Jordan,et al.  On the Acceleration of the Sinkhorn and Greenkhorn Algorithms for Optimal Transport , 2019, ArXiv.

[73]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[74]  James B. Orlin,et al.  A faster strongly polynomial minimum cost flow algorithm , 1993, STOC '88.

[75]  Jean-David Benamou,et al.  Generalized incompressible flows, multi-marginal transport and Sinkhorn algorithm , 2017, Numerische Mathematik.

[76]  Adam M. Oberman,et al.  NUMERICAL METHODS FOR MATCHING FOR TEAMS AND WASSERSTEIN BARYCENTERS , 2014, 1411.3602.

[77]  Lin Lin,et al.  Kantorovich dual solution for strictly correlated electrons in atoms and molecules , 2012, 1210.7117.

[78]  Robert E. Tarjan,et al.  Dynamic trees as search trees via euler tours, applied to the network simplex algorithm , 1997, Math. Program..

[79]  Andrew V. Goldberg,et al.  Finding Minimum-Cost Circulations by Successive Approximation , 1990, Math. Oper. Res..

[80]  Le Hui,et al.  Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[81]  Yin Tat Lee,et al.  Solving linear programs in the current matrix multiplication time , 2018, STOC.

[82]  H. Soner,et al.  Robust Hedging and Martingale Optimal Transport in Continuous Time , 2012 .

[83]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[84]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[85]  W. Gangbo,et al.  Optimal maps for the multidimensional Monge-Kantorovich problem , 1998 .

[86]  Gabriel Peyré,et al.  Semi-dual Regularized Optimal Transport , 2018, SIAM Rev..

[87]  Gabriel Peyré,et al.  Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[88]  Y. Brenier The least action principle and the related concept of generalized flows for incompressible perfect fluids , 1989 .

[89]  Jing Lei Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces , 2018, Bernoulli.

[90]  Steve Oudot,et al.  Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport , 2018, NeurIPS.

[91]  Liang Mi,et al.  Multi-Marginal Optimal Transport Defines a Generalized Metric , 2020, ArXiv.

[92]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .