Information Geometry of Wasserstein Divergence

There are two geometrical structures in a manifold of probability distributions. One is invariant, based on the Fisher information, and the other is based on the Wasserstein distance of optimal transportation. We propose a unified framework which connects the Wasserstein distance and the Kullback-Leibler (KL) divergence to give a new information-geometrical theory. We consider the discrete case consisting of n elements and study the geometry of the probability simplex \(S_{n-1}\), the set of all probability distributions over n atoms. The Wasserstein distance is introduced in \(S_{n-1}\) by the optimal transportation of commodities from distribution \({\varvec{p}} \in S_{n-1}\) to \({\varvec{q}} \in S_{n-1}\). We relax the optimal transportation by using entropy, introduced by Cuturi (2013) and show that the entropy-relaxed transportation plan naturally defines the exponential family and the dually flat structure of information geometry. Although the optimal cost does not define a distance function, we introduce a novel divergence function in \(S_{n-1}\), which connects the relaxed Wasserstein distance to the KL-divergence by one parameter.