PDOT: a Practical Primal-Dual Algorithm and a GPU-Based Solver for Optimal Transport

Haihao Lu,Jinwen Yang
2024-07-29
Abstract:In this paper, we propose a practical primal-dual algorithm with theoretical guarantees and develop a GPU-based solver, which we dub PDOT, for solving large-scale optimal transport problems. Compared to Sinkhorn algorithm or classic LP algorithms, PDOT can achieve high-accuracy solution while efficiently taking advantage of modern computing architecture, i.e., GPUs. On the theoretical side, we show that PDOT has a data-independent $\widetilde O(mn(m+n)^{1.5}\log(\frac{1}{\epsilon}))$ local flop complexity where $\epsilon$ is the desired accuracy, and $m$ and $n$ are the dimension of the original and target distribution, respectively. We further present a data-dependent $\widetilde O(mn(m+n)^{3.5}\Delta + mn(m+n)^{1.5}\log(\frac{1}{\epsilon}))$ global flop complexity of PDOT, where $\Delta$ is the precision of the data. On the numerical side, we present PDOT, an open-source GPU solver based on the proposed algorithm. Our extensive numerical experiments consistently demonstrate the well balance of PDOT in computing efficiency and accuracy of the solution, compared to Gurobi and Sinkhorn algorithms.
Optimization and Control
What problem does this paper attempt to address?
The paper is primarily dedicated to the development of efficient algorithms for solving large-scale Optimal Transport (OT) problems. Specifically, the research team proposed a Practical Primal-Dual Algorithm and a GPU-based solver, PDOT (Practical Dual-Optimal Transport), to address large-scale OT problems. Compared to existing Sinkhorn algorithms or classical Linear Programming (LP) algorithms, PDOT can solve OT problems with high precision and effectively utilize modern computing architectures such as GPUs. From a theoretical perspective, PDOT has a data-independent local computational complexity of \(O(mn(m + n)^{1.5}\log(\frac{1}{\epsilon}))\), where \(m\) and \(n\) are the dimensions of the source and target distributions, respectively, and \(\epsilon\) is the desired accuracy. Additionally, the paper provides a data-dependent global computational complexity of \(O(mn(m+n)^{3.5}\Delta + mn(m+n)^{1.5}\log(\frac{1}{\epsilon}))\), where \(\Delta\) is the precision of the data. In terms of numerical experiments, the authors developed an open-source GPU solver, PDOT, based on the proposed algorithm. Extensive numerical experiments demonstrate that PDOT achieves a good balance between computational efficiency and solution accuracy, showing advantages over Gurobi and Sinkhorn algorithms. In summary, the paper aims to address the following issues: - Develop a practical algorithm for large-scale OT problems that can achieve high-precision solutions and be efficiently implemented on GPUs. By proposing the PDOT algorithm, the authors provide a feasible method for solving large-scale OT problems, especially in scenarios requiring high-precision solutions.