Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods

Gen Li,Yanxi Chen,Yu Huang,Yuejie Chi,H. Vincent Poor,Yuxin Chen
2024-06-20
Abstract:Efficient computation of the optimal transport distance between two distributions serves as an algorithm subroutine that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $\varepsilon$ additive accuracy with runtime $\widetilde{O}( n^2/\varepsilon)$, where $n$ denotes the dimension of the probability distributions of interest. Our algorithm achieves the state-of-the-art computational guarantees among all first-order methods, while exhibiting favorable numerical performance compared to classical algorithms like Sinkhorn and Greenkhorn. Underlying our algorithm designs are two key elements: (a) converting the original problem into a bilinear minimax problem over probability distributions; (b) exploiting the extragradient idea -- in conjunction with entropy regularization and adaptive learning rates -- to accelerate convergence.
Machine Learning,Data Structures and Algorithms,Information Theory,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to efficiently calculate the optimal transport distance between two probability distributions. Specifically, the authors developed a first - order optimization - based method that can calculate the optimal transport distance in a running time of \(eO\left(\frac{n^2}{\varepsilon}\right)\) with an \(\varepsilon\) - additive precision, where \(n\) represents the dimension of the probability distribution. This method is superior to existing classical algorithms, such as the Sinkhorn and Greenkhorn algorithms, in both theoretical guarantees and numerical performance. ### Background and Problem Description of the Paper The optimal transport (OT) problem has wide applications in modern data science, such as the difference measurement between the model distribution and the real distribution in generative adversarial networks (GANs), the evaluation of the intrinsic dissimilarity between point clouds in computer graphics, and the evaluation of distribution shift in transfer learning. The core of these problems lies in how to quantify the distance between two probability distributions. Although traditional linear programming methods can solve the optimal transport problem, they become infeasible in large - scale applications due to the sharp increase in problem dimensions. For example, the linear - programming - based algorithm requires a running time of \(eO(n^{2.5})\), which is far more than the time required to read the cost matrix \(W\). In contrast, the Sinkhorn iterative method achieves an approximate linear - time complexity of \(eO\left(\frac{n^2}{\varepsilon^2}\right)\) by exploiting the special structure of the entropy - regularized variant, but its dependence on \(\varepsilon\) is still not optimal. ### Main Contributions of the Paper This paper proposes a new algorithm aiming to solve the optimal transport problem and has the following features: 1. **Problem Transformation**: Starting from the original problem, it is transformed into a bilinear min - max problem involving two sets of probability distributions. 2. **Entropy - Regularized Extra - Gradient Method**: An entropy - regularized extra - gradient method is designed. In each iteration step, two mirror - descent - type updates are carried out, and the learning rate is adaptively selected according to the corresponding row or column marginal distributions. 3. **Theoretical Guarantee**: The proposed algorithm can reach an \(\varepsilon\) - additive precision in \(eO\left(\frac{1}{\varepsilon}\right)\) iterations or a running time of \(eO\left(\frac{n^2}{\varepsilon}\right)\), thus constituting an almost linear - time algorithm. ### Theoretical Analysis and Experimental Results The paper proves the convergence properties of the algorithm through strict theoretical analysis and verifies its effectiveness in practical applications through numerical experiments. The experimental results show that this algorithm is superior to the classical Sinkhorn and Greenkhorn algorithms in numerical performance. ### Related Work The paper also reviews the research progress in related fields, including the application of entropy regularization in probability distribution optimization problems, the research on extra - gradient methods in saddle - point optimization, and the early algorithms for the optimal transport problem. These studies provide theoretical basis and technical support for the work of this paper. In conclusion, this paper proposes an efficient and practical algorithm for solving large - scale optimal transport problems, which not only reaches the optimal complexity theoretically but also performs excellently in practical applications.