Shibing Chen, *University of Science and Technology of China*

Young-Heon Kim, *University of British Columbia*

Soumik Pal, *University of Washington*

Brendan Pass, *University of Alberta*

Asuka Takatsu, *Tokyo Metropolitan University*

Consider the Monge-Kantorovich optimal transport problem where the cost function is given by a Bregman divergence. The associated transport cost, termed the Bregman-Wasserstein divergence here, presents a natural asymmetric extension of the (squared) 2-Wasserstein metric and has recently found applications in statistics and machine learning. On the other hand, Bregman divergence is a fundamental concept in information geometry and induces a dually flat geometry on the underlying manifold. Using the Bregman-Wasserstein divergence, we lift this dualistic geometry to the space of probability measures, yielding an extension of Otto’s weak Riemannian structure of the Wasserstein space to statistical manifolds. We do this by generalizing Lott’s formal geometric computations for the Wasserstein space. In particular, we define generalized displacement interpolations which are compatible with the Bregman geometry, and construct conjugate primal and dual connections on the space of distributions. We also discuss some potential applications. Ongoing joint work with Cale Rankin.

Modern advances in technology have led to the generation of ever-increasing amounts of quantitative data from biological systems, such as gene-expression snapshots of developing cell populations in a tissue, or geometric data of residue positions within a protein. Experimental observations are often limited to be partial, so information about the underlying process or structure must instead be inferred from data. Through its connection to the Schrödinger problem of large deviations for stochastic processes, we find that entropic optimal transport arises as a natural tool for reconstructing unobserved cellular trajectories under precise assumptions. We develop both a theoretical and computational framework for inferring cellular dynamics based on optimal transport, and demonstrate its potential to extract the genetic logic underlying biological dynamics. In another vein, we also discuss the utility of generalized notions of optimal transport for matching and summarizing topological features in geometric structures such as biomolecules.

Joint work with (the groups of) Prof. Geoffrey Schiebinger, Prof. Lénaïc Chizat and Prof. Michael Stumpf.

Adversarial training is a framework widely used by practitioners to enforce robustness of machine learning models. During the training process, the learner is pitted against an adversary who has the power to alter the input data. As a result, the learner is forced to build a model that is robust to data perturbations. Despite the importance and relative conceptual simplicity of adversarial training, there are many aspects that are still not well-understood (e.g. regularization effects, geometric/analytic interpretations, tradeoff between accuracy and robustness, etc…), particularly in the case of multiclass classification.

In this talk, I will show that in the non-parametric setting, the adversarial training problem is equivalent to a generalized version of the Wasserstein barycenter problem. The connection between these problems allows us to completely characterize the optimal adversarial strategy and to bring in tools from optimal transport to analyze and compute optimal classifiers. This also has implications for the parametric setting, as the value of the generalized barycenter problem gives a universal upper bound on the robustness/accuracy tradeoff inherent to adversarial training.

Joint work with Nicolas Garcia Trillos and Jakwang Kim.

Interest from the machine learning community in optimal transport has surged over the past five years. A key reason for this is that the Wasserstein metric provides a unique way to measure the distance between data distributions—one that respects the geometry of the underlying space and behaves well even when distributions lack overlapping support.

In today’s talk, I will present two recent works that leverage the benefits of the Wasserstein metric in vastly different contexts. First, I will describe how the Wasserstein metric can be used to define a novel notion of archetypal analysis — in which one approximates a data distribution by a uniform probability measure on a convex polygon, so that the vertices provide exemplars of extreme points of the data. Next, I will discuss an application of optimal transport to collider physics, in which comparing collider events using the Wasserstein metric allowed us to achieve state of the art accuracy with vastly improved computational efficiency. In both cases, I will discuss both the theoretical benefits and the computational challenges of optimal transport in the machine learning context.

We investigate barycenters of probability measures on Gromov hyperbolic spaces, toward development of convex optimization in this class of metric spaces. We establish a contraction property in terms of the Wasserstein distance, a deterministic approximation of barycenters of uniform distributions on finite points, and a kind of law of large numbers. These generalize the corresponding results on CAT(0)-spaces, up to additional terms depending on the hyperbolicity constant.

Inspired by Perelman’s work on the entropy formula for Ricci flow, we introduce the $W$-entropy and prove its monotonicity and rigidity theorem for the geodesics flow on the Wasserstein space over Riemannian manifold. This improves an earlier result due to Lott and Villani on the displacement of the Boltzmann type entropy on Riemannian manifolds with non-negative Ricci curvature. We then introduce the Langevin deformation on the Wasserstein space, which interpolates the Wasserstein geodesic flow and the gradient flow of the Boltzmann entropy (i.e., the heat equation on the underlying manifold). Moreover, we present a $W$-entropy type formula for the Langevin deformation. Joint work with Songzi Li.

Optimal transport studies the most economical movement of resources. In other words, one considers a pile of raw material and wants to transport it to a final configuration in a cost-efficient way. Under quite general assumptions, the solution to this problem will be induced by a transport map where the mass at each point in the initial distribution is sent to a unique point in the target distribution. In this talk, we will discuss the regularity of this transport map (i.e., whether nearby points in the first pile are sent to nearby points in the second pile). $$ $$ It turns out there are both local and global obstructions to establishing smoothness for the transport. When the cost is induced by a convex potential, we show that the local obstruction corresponds to the curvature of an associated Kähler manifold and discuss the geometry of this curvature tensor. In particular, we show (somewhat surprisingly) that its negativity is preserved along Kähler-Ricci flow.

We discuss the gradient flow structure of hydrodynamic limit equations obtained from microscopic interacting particle systems. We first show that for a wide class of reversible non-degenerate interacting particle systems, their hydrodynamic limit equations are formally given as a gradient flow with respect to the Wasserstein metric with mobility. Then we discuss when such formulation can be justified rigorously, and how it can be applied for the study of hydrodynamic limits. This talk is based on an ongoing work with Kohei Hayashi.

Multi-marginal optimal transport, a natural extension of the well-known classical optimal transport problem, is the problem of correlating given probability measures as efficiently as possible relative to a given cost function. Although a variety of applications have arisen over the past twelve years, the structure of solutions for the multi-marginal case has been difficult to address, mainly due to the strong dependence on the cost function. In this talk, I will briefly outline the known theory for uniqueness of this problem. Next, I will present a recent joint work with Brendan Pass based on a general condition on the cost function that provides uniqueness.

Consider the problem of matching two independent sets of $N$ i.i.d. observations from two densities $\rho_0$ and $\rho_1$ in $\mathbb{R}^d$. For an arbitrary continuous cost function, the optimal assignment problem looks for the matching that minimizes the total cost. We consider instead the problem where each matching is endowed with a Gibbs probability weight proportional to the exponential of the negative total cost of that matching. Viewing each matching as a joint distribution with $N$ atoms, we then take a convex combination with respect to the above Gibbs probability measure. We show that this resulting random joint distribution converges, as $N\rightarrow \infty$, to the solution of a variational problem, introduced by Föllmer, called the Schrödinger problem. Finally, we prove limiting Gaussian fluctuations for this convergence in the form of Central Limit Theorems for integrated test functions. This is enabled by a novel chaos decomposition for permutation symmetric statistics, generalizing the Hoeffding decomposition for U-statistics. Our results establish a novel passage for the transition from discrete to continuum in Schrödinger’s lazy gas experiment.

In this talk, we will introduce some interesting applications of optimal transportation in various fields including a reconstruction problem in cosmology; a brief proof of isoperimetric inequality in geometry; and an application in image recognition relating to a transport between hypercubes. This talk is based on a series of joint work with Shibing Chen, Xu-Jia Wang, and with Gregoire Loeper.