Abstract:This paper presents an efficient algorithm for the progressive approximation of Wasserstein barycenters of persistence diagrams, with applications to the visual analysis of ensemble data. Given a set of scalar fields, our approach enables the computation of a persistence diagram which is representative of the set, and which visually conveys the number, data ranges and saliences of the main features of interest found in the set. Such representative diagrams are obtained by computing explicitly the discrete Wasserstein barycenter of the set of persistence diagrams, a notoriously computationally intensive task. In particular, we revisit efficient algorithms for Wasserstein distance approximation [12,51] to extend previous work on barycenter estimation [94]. We present a new fast algorithm, which progressively approximates the barycenter by iteratively increasing the computation accuracy as well as the number of persistent features in the output diagram. Such a progressivity drastically improves convergence in practice and allows to design an interruptible algorithm, capable of respecting computation time constraints. This enables the approximation of Wasserstein barycenters within interactive times. We present an application to ensemble clustering where we revisit the k-means algorithm to exploit our barycenters and compute, within execution time constraints, meaningful clusters of ensemble data along with their barycenter diagram. Extensive experiments on synthetic and real-life data sets report that our algorithm converges to barycenters that are qualitatively meaningful with regard to the applications, and quantitatively comparable to previous techniques, while offering an order of magnitude speedup when run until convergence (without time constraint). Our algorithm can be trivially parallelized to provide additional speedups in practice on standard workstations. [...]
Graphics,Computational Geometry,Computer Vision and Pattern Recognition,Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when dealing with a large amount of ensemble data, how to efficiently calculate the Wasserstein barycenter that can represent a set of persistence diagrams, in order to better summarize and visualize the trends of the main features in the ensemble. Specifically:
1. **Background problems**:
- In many scientific and engineering fields, modern numerical simulations will generate a large amount of ensemble data, which contains the same phenomenon under different input conditions and parameters.
- For each ensemble member, its persistence diagram can be calculated, but directly analyzing the collection of these persistence diagrams is still very complex.
2. **Limitations of existing methods**:
- Simple methods, such as calculating the point - by - point average of the scalar fields of ensemble members and then generating their persistence diagrams, will result in the generated persistence diagrams containing an incorrect number of features, and thus cannot well represent the original ensemble.
- Using existing Wasserstein barycenter algorithms (such as Turner et al.'s method [94]) can obtain more accurate results, but the computational cost is very high and it is difficult to apply to actual large - scale data sets.
3. **The solution to the problem proposed in the paper**:
- This paper introduces a fast progressive approximation algorithm to calculate the discrete Wasserstein barycenter to overcome the computational bottlenecks of existing methods.
- This algorithm significantly accelerates the convergence process by gradually increasing the calculation precision and the number of features in the output diagram, and can provide meaningful results within a limited time.
- In addition, this algorithm supports interruption operations and can produce approximate results within interactive time, which is suitable for scenarios that require rapid feedback.
### Main contributions of the paper
1. **Progressive Wasserstein barycenter algorithm**:
- A new algorithm based on the progressive approximation strategy is proposed, which gradually improves the calculation precision and output details.
- The most significant features are given priority, and the noise features are processed last.
- Experiments show that this algorithm is an order of magnitude faster than the existing fastest combination techniques and is easy to parallelize.
2. **Interruptible persistence diagram clustering algorithm**:
- Based on the above method, the k - means algorithm is extended to achieve interruptible persistence diagram clustering.
- It is used for visual exploration of the global feature trends in the analysis of ensemble data.
3. **Implementation**:
- A lightweight C++ implementation is provided to facilitate the reproduction of experimental results.
### Mathematical formulas
- Definition of Wasserstein distance:
\[
W_q(D(f), D(g))=\left(\min_{\phi\in\Phi}\sum_{a\in D(f)}d_q(a,\phi(a))^q\right)^{1 / q}
\]
where \(d_q(a, b)=(|x_b - x_a|^q+|y_b - y_a|^q)^{1 / q}\) is the Lq distance between two points.
- Geometrically lifted point - to - point distance:
\[
\hat{d}_2(a, b)=\sqrt{(1-\alpha)d_2(a, b)^2+\alpha\left\|p_\lambda^a - p_\lambda^b\right\|^2_2}
\]
where \(\alpha\in[0,1]\) controls the importance of the geometric layout, and \(p_\lambda^a\) is a linear combination of critical point pairs.
Through these improvements, this paper provides a practical method for efficiently calculating the Wasserstein barycenter, which greatly improves the ability to process large - scale ensemble data.