Interpolating between Optimal Transport and MMD using Sinkhorn Divergences

Comparing probability distributions is a fundamental problem in data sciences. Simple norms and divergences such as the total variation and the relative entropy only compare densities in a point-wise manner and fail to capture the geometric nature of the problem. In sharp contrast, Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between measures that take into account the geometry of the underlying space and metrize the convergence in law. This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT. Relying on a new notion of geometric entropy, we provide theoretical guarantees for these divergences: positivity, convexity and metrization of the convergence in law. On the practical side, we detail a numerical scheme that enables the large scale application of these divergences for machine learning: on the GPU, gradients of the Sinkhorn loss can be computed for batches of a million samples.

[1]  Joachim Weickert,et al.  Universität Des Saarlandes Fachrichtung 6.1 – Mathematik Electrostatic Halftoning Electrostatic Halftoning , 2022 .

[2]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[3]  Bernhard Schmitzer,et al.  Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems , 2016, SIAM J. Sci. Comput..

[4]  F. Bassetti,et al.  On minimum Kantorovich distance estimators , 2006 .

[5]  Nicolas Papadakis,et al.  Overrelaxed Sinkhorn-Knopp Algorithm for Regularized Optimal Transport , 2017, Algorithms.

[6]  Gábor J. Székely,et al.  Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method , 2005, J. Classif..

[7]  Jason D. Lee,et al.  On the Convergence and Robustness of Training GANs with Regularized Optimal Transport , 2018, NeurIPS.

[8]  Christian L'eonard A survey of the Schr\"odinger problem and some of its connections with optimal transport , 2013, 1308.0215.

[9]  L. Younes,et al.  Diffeomorphic matching of distributions: a new approach for unlabelled point-sets and sub-manifolds matching , 2004, CVPR 2004.

[10]  Marco Cuturi,et al.  On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , 2015, Entropy.

[11]  Joan Alexis Glaunès,et al.  Surface Matching via Currents , 2005, IPMI.

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Alan L. Yuille,et al.  The invisible hand algorithm: Solving the assignment problem with statistical physics , 1994, Neural Networks.

[14]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[15]  Jason D. Lee,et al.  Solving Approximate Wasserstein GANs to Stationarity , 2018, ArXiv.

[16]  Nicolas Charon,et al.  A General Framework for Curve and Surface Comparison and Registration with Oriented Varifolds , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[18]  Jean Feydy,et al.  Kernel Operations on the GPU, with Autodiff, without Memory Overflows , 2020, ArXiv.

[19]  Han Zhang,et al.  Improving GANs Using Optimal Transport , 2018, ICLR.

[20]  Klaus-Robert Müller,et al.  Wasserstein Training of Restricted Boltzmann Machines , 2016, NIPS.

[21]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[22]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[23]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[24]  J. Lorenz,et al.  On the scaling of multidimensional matrices , 1989 .

[25]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[26]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[27]  A. Galichon,et al.  Matching with Trade-Offs: Revealed Preferences Over Competing Characteristics , 2009, 2102.12811.

[28]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[29]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[30]  Gabriel Peyré,et al.  Wasserstein barycentric coordinates , 2016, ACM Trans. Graph..

[31]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[32]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[33]  Anand Rangarajan,et al.  A new algorithm for non-rigid point matching , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[34]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[35]  Filippo Santambrogio,et al.  Optimal Transport for Applied Mathematicians , 2015 .