Abstract:Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Learning over the period 2012 -- 2023, focusing on four sub-fields of Machine Learning: supervised, unsupervised, transfer and reinforcement learning. We further highlight the recent development in computational Optimal Transport and its extensions, such as partial, unbalanced, Gromov and Neural Optimal Transport, and its interplay with Machine Learning practice.
What problem does this paper attempt to address?
This paper aims to explore the latest advancements and applications of Optimal Transport (OT) in the field of machine learning. Specifically, the paper focuses on the contributions of OT in the following four subfields from 2012 to 2023: supervised learning, unsupervised learning, transfer learning, and reinforcement learning. Additionally, the paper discusses the latest developments in computing optimal transport methods, including partial transport, unbalanced transport, Gromov transport, and neural network transport extensions, and analyzes how these methods integrate with machine learning practices.
The paper first introduces the basic concept of optimal transport, which is a probabilistic framework used to compare and manipulate probability distributions. It achieves this by defining a metric between distributions (such as the Wasserstein distance). This metric has advantages in fields like generative modeling. Next, the paper explores different forms of OT theory, such as the Monge-Kantorovich formulation and dynamic OT, and discusses how these theories are applied to practical problems.
Following this, the paper provides a detailed introduction to methods for computing OT, including discretization strategies, sample approximation, and projection-based methods. For example, the Sinkhorn algorithm is used to accelerate OT computation, while the Sliced Wasserstein distance simplifies computational complexity by projecting high-dimensional data into one-dimensional space. Additionally, the paper proposes some structured OT methods, such as low-rank structured OT plans, to address the curse of dimensionality.
Finally, the paper discusses some extensions of OT, such as unbalanced OT and partial OT, which help handle datasets with outliers. Meanwhile, the Gromov-Wasserstein distance allows for the comparison of distributions in different spaces. Weak optimal transport provides a new perspective for understanding OT problems.
In summary, this paper aims to systematically review and summarize the applications of optimal transport in machine learning and explore its latest advancements and technical challenges in different subfields.