Consider the Monge-Kantorovich optimal transport problem where the cost function is given by a Bregman divergence. The associated transport cost, termed the Bregman-Wasserstein divergence here, presents a natural asymmetric extension of the (squared) 2-Wasserstein metric and has recently found applications in statistics and machine learning. On the other hand, Bregman divergence is a fundamental concept in information geometry and induces a dually flat geometry on the underlying manifold. Using the Bregman-Wasserstein divergence, we lift this dualistic geometry to the space of probability measures, yielding an extension of Otto’s weak Riemannian structure of the Wasserstein space to statistical manifolds. We do this by generalizing Lott’s formal geometric computations for the Wasserstein space. In particular, we define generalized displacement interpolations which are compatible with the Bregman geometry, and construct conjugate primal and dual connections on the space of distributions. We also discuss some potential applications. Ongoing joint work with Cale Rankin.

Modern advances in technology have led to the generation of ever-increasing amounts of quantitative data from biological systems, such as gene-expression snapshots of developing cell populations in a tissue, or geometric data of residue positions within a protein. Experimental observations are often limited to be partial, so information about the underlying process or structure must instead be inferred from data. Through its connection to the Schrödinger problem of large deviations for stochastic processes, we find that entropic optimal transport arises as a natural tool for reconstructing unobserved cellular trajectories under precise assumptions. We develop both a theoretical and computational framework for inferring cellular dynamics based on optimal transport, and demonstrate its potential to extract the genetic logic underlying biological dynamics. In another vein, we also discuss the utility of generalized notions of optimal transport for matching and summarizing topological features in geometric structures such as biomolecules.

Joint work with (the groups of) Prof. Geoffrey Schiebinger, Prof. Lénaïc Chizat and Prof. Michael Stumpf.

Adversarial training is a framework widely used by practitioners to enforce robustness of machine learning models. During the training process, the learner is pitted against an adversary who has the power to alter the input data. As a result, the learner is forced to build a model that is robust to data perturbations. Despite the importance and relative conceptual simplicity of adversarial training, there are many aspects that are still not well-understood (e.g. regularization effects, geometric/analytic interpretations, tradeoff between accuracy and robustness, etc…), particularly in the case of multiclass classification.

In this talk, I will show that in the non-parametric setting, the adversarial training problem is equivalent to a generalized version of the Wasserstein barycenter problem. The connection between these problems allows us to completely characterize the optimal adversarial strategy and to bring in tools from optimal transport to analyze and compute optimal classifiers. This also has implications for the parametric setting, as the value of the generalized barycenter problem gives a universal upper bound on the robustness/accuracy tradeoff inherent to adversarial training.

Joint work with Nicolas Garcia Trillos and Jakwang Kim.

Interest from the machine learning community in optimal transport has surged over the past five years. A key reason for this is that the Wasserstein metric provides a unique way to measure the distance between data distributions—one that respects the geometry of the underlying space and behaves well even when distributions lack overlapping support.

In today’s talk, I will present two recent works that leverage the benefits of the Wasserstein metric in vastly different contexts. First, I will describe how the Wasserstein metric can be used to define a novel notion of archetypal analysis — in which one approximates a data distribution by a uniform probability measure on a convex polygon, so that the vertices provide exemplars of extreme points of the data. Next, I will discuss an application of optimal transport to collider physics, in which comparing collider events using the Wasserstein metric allowed us to achieve state of the art accuracy with vastly improved computational efficiency. In both cases, I will discuss both the theoretical benefits and the computational challenges of optimal transport in the machine learning context.

We investigate barycenters of probability measures on Gromov hyperbolic spaces, toward development of convex optimization in this class of metric spaces. We establish a contraction property in terms of the Wasserstein distance, a deterministic approximation of barycenters of uniform distributions on finite points, and a kind of law of large numbers. These generalize the corresponding results on CAT(0)-spaces, up to additional terms depending on the hyperbolicity constant.

Let $d\geq 2$ and let $H$ denote the absolute multiplicative Weil height on $\bar{\mathbb{Q}}$. Let $f(z)\in \bar{\mathbb{Q}}[z]$ of degree $d$ and let $a\in\bar{\mathbb{Q}}$, the multiplicative and logarithmic canonical heights of $a$ with respect to $f$ are defined as

$$ \hat{H}_f (a) =\lim H(f^n(a))^{1/d^n} \quad \text{and}\quad \hat{h}_f(a)=\log \hat{H}_f(a). $$

Let $n$ be a positive integer. For $1\leq i\leq n$, let $f_i\in\bar{\mathbb{Q}}[z]$ of degree $d$ and let $a_i\in\bar{\mathbb{Q}}$. In this talk, we provide a complete characterization of when the $\hat{H}_{f_i}(a_i)$’s are multiplicatively dependent modulo constant meaning there exist integers $m_1,\ldots,m_n$ not all of which are $0$ and $a\in \bar{\mathbb{Q}}$ such that:

$$ \hat{H}_{f_1}(a_1)^{m_1} \cdots \hat{H}_{f_n}(a_n)^{m_n}=a. $$

As an immediate consequence, we characterize all the pairs $(f,a)$ such that $\hat{H}_f(a)$ is an algebraic number and proves the existence of $(f,a)$ such that $\hat{h}_f(a)$ is an irrational number. The proof uses the Medvedev-Scanlon classification of preperiodic subvarieties under the dynamics of a split polynomial map and the construction of a certain auxiliary polynomial. This is joint work with Jason Bell.

In 1900, Hilbert posed the following problem: “Given a Diophantine equation with integer coefficients: to devise a process according to which it can be determined in a finite number of operations whether the equation is solvable in (rational) integers.”

Building on the work of several mathematicians, in 1970, Matiyasevich proved that this problem has a negative answer, i.e., such a general `process’ (algorithm) does not exist.

In the late 1970’s, Denef–Lipshitz formulated an analogue of Hilbert’s 10th problem for rings of integers of number fields.

In recent years, techniques from arithmetic geometry have been used extensively to attack this problem. One such instance is the work of García-Fritz and Pasten (from 2019) which showed that the analogue of Hilbert’s 10th problem is unsolvable in the ring of integers of number fields of the form $\mathbb{Q}(\sqrt[3]{p},\sqrt{-q})$ for positive proportions of primes $p$ and $q$. In joint work with Lei and Sprung, we improve their proportions and extend their results in several directions.

We study Campana’s orbifold conjecture for finite ramified covers of $\mathbb P^2$ with three components admitting sufficiently large multiplicities. We also prove a truncated second main theorem of level one for analytic maps into $\mathbb P^2$ intersecting the coordinate lines in sufficiently high multiplicities. In particular, the exceptional set for the later result can be described explicitly.

This is joint work with Ji Guo.

In this talk, I will explain how the Batyrev-Manin conjecture on rational points can be generalized to Deligne-Mumford stacks by using twisted sectors. In the original conjecture, the so-called a- and b-invariants are determined by positions of the ample line bundle in question and the canonical divisor in the Néron-Severi space relative to the pseudo-effective cone. In generalization to stacks, we introduce orbifold versions of these algebro-geometric notions. Once we define them suitably, the generalized conjecture is formulated more or less in the same way as the original conjecture was. The Malle conjecture on Galois extensions of a number field is then regarded as a special case of it. This is a joint work with Ratko Darda.

I will discuss local-global principles for two notions of semi-integral points, termed Campana points and Darmon points. In particular, I will introduce a semi-integral version of the Brauer-Manin obstruction interpolating between Manin’s classical version for rational points and the integral version developed by Colliot-Thélène and Xu. Lastly, we will apply these tools to study semi-integral points on quadric orbifolds.

The characteristic cycle of a constructible sheaf on a projective smooth algebraic variety is an algebraic cycle on the cotangent bundle that computes the Euler characteristic of the sheaf. In this talk, we consider a rank 1 sheaf on the variety. For a computation of the characteristic cycle of a rank 1 sheaf, we introduce a general theory called partially logarithmic ramification theory, and construct an algebraic cycle using several invariants measuring the ramification of the sheaf, which is compared with the characteristic cycle.

The construction of Kummer K3 surfaces from abelian surfaces can be generalized to yield higher dimensional varieties known as hyperk"ahler varieties of Kummer type. Hassett and Tschinkel showed that a portion of the middle cohomology of generalized Kummer 4-folds may be understood as fixed loci of symplectic involutions corresponding to the three-torsion points of the abelian surface. In recent work with Sarah Frei, we have extended this result, allowing us to characterize the Galois action on the cohomology when working over non-closed fields.