Leveraging Optimal Transport via Projections on Subspaces for Machine Learning Applications

Clément Bonet
2023-11-23
Abstract:Optimal Transport has received much attention in Machine Learning as it allows to compare probability distributions by exploiting the geometry of the underlying space. However, in its original formulation, solving this problem suffers from a significant computational burden. Thus, a meaningful line of work consists at proposing alternatives to reduce this burden while still enjoying its properties. In this thesis, we focus on alternatives which use projections on subspaces. The main such alternative is the Sliced-Wasserstein distance, which we first propose to extend to Riemannian manifolds in order to use it in Machine Learning applications for which using such spaces has been shown to be beneficial in the recent years. We also study sliced distances between positive measures in the so-called unbalanced OT problem. Back to the original Euclidean Sliced-Wasserstein distance between probability measures, we study the dynamic of gradient flows when endowing the space with this distance in place of the usual Wasserstein distance. Then, we investigate the use of the Busemann function, a generalization of the inner product in metric spaces, in the space of probability measures. Finally, we extend the subspace detour approach to incomparable spaces using the Gromov-Wasserstein distance.
Machine Learning
What problem does this paper attempt to address?
The paper primarily focuses on the application of Optimal Transport (OT) in machine learning, especially in methods for handling probability distributions on different manifolds. Specifically: 1. **Application of Optimal Transport in Machine Learning**: - Optimal Transport allows for the comparison of different probability distributions by finding the cheapest way to move from one distribution to another, making it very useful for handling probability distributions with different supports. - The paper explores how Optimal Transport can be applied to generative modeling tasks as an alternative to Kullback-Leibler divergence (KL divergence). 2. **Motivation**: - Traditional methods like KL divergence require that the two distributions have densities and share the same support set, which is not always the case in real-world data. - Optimal Transport can better preserve the geometric structure of the data and handle distributions that do not have the same support set. 3. **Main Contributions of the Paper**: - **Part 1: Application of Sliced Wasserstein Distance on Riemannian Manifolds**: - Explores how to define and compute the sliced Wasserstein distance on Riemannian manifolds with non-positive and non-negative curvature. - Proposes the sliced Wasserstein distance and its properties in spaces of symmetric positive definite matrices, hyperbolic spaces, and spherical spaces. - **Part 2: Achieving Optimal Transport through Projections and Its Variants**: - Investigates how to simplify the Optimal Transport problem through projection methods and proposes several new distance metrics, such as the combination of unbalanced Optimal Transport and sliced Wasserstein. - Explores the properties of gradient flows in the sliced Wasserstein space and their applications. Through these theoretical studies and experimental validations, the paper aims to provide new tools and methods for probability distribution modeling in machine learning, particularly when dealing with data with complex geometric structures.