Hilbert Curve Projection Distance for Distribution Comparison

Tao Li,Cheng Meng,Hongteng Xu,Jun Yu
2024-02-06
Abstract:Distribution comparison plays a central role in many machine learning tasks like data classification and generative modeling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with low complexity. In particular, we first project two high-dimensional probability distributions using Hilbert curve to obtain a coupling between them, and then calculate the transport distance between these two distributions in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for probability measures with bounded supports. Furthermore, we demonstrate that the modified empirical HCP distance with the $L_p$ cost in the $d$-dimensional space converges to its population counterpart at a rate of no more than $O(n^{-1/2\max\{d,p\}})$. To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of efficiently and accurately comparing the distance between two probability distributions in machine learning tasks. Specifically, the authors propose a new metric method—Hilbert Curve Projection (HCP) distance, for measuring the distance between two high-dimensional probability distributions with low complexity. Traditional methods such as the Wasserstein distance, although theoretically advantageous, have high computational complexity in practical applications, while other approximation methods like the Sliced Wasserstein distance (SW) may not effectively preserve the structure of the original distribution. The main contributions of the paper include: 1. **Proposing the HCP distance**: By projecting high-dimensional probability distributions onto a one-dimensional space along the Hilbert curve and then calculating the transport distance between the projected distributions, an effective probability distribution distance metric is obtained. 2. **Theoretical analysis**: It is proven that the HCP distance is a reasonable metric and is well-defined for probability measures with bounded support. Additionally, it is shown that the HCP distance is an upper bound of the p-Wasserstein distance. 3. **Computational efficiency**: The computational complexity of the HCP distance is close to linear, making it suitable for large-scale datasets. Compared to the SW distance, the HCP distance has an advantage in computation speed. 4. **Experimental validation**: Through experiments on synthetic and real data, the effectiveness and efficiency of the HCP distance are validated, particularly showing superior performance in generative modeling and data classification tasks compared to existing methods. In summary, this paper aims to provide an efficient and effective probability distribution distance metric to overcome the shortcomings of existing methods in terms of computational complexity and structure preservation.