Understanding the Topology and the Geometry of the Space of Persistence Diagrams via Optimal Partial Transport

Vincent Divol,Théo Lacombe
DOI: https://doi.org/10.1007/s41468-020-00061-z
2024-05-28
Abstract:Despite the obvious similarities between the metrics used in topological data analysis and those of optimal transport, an optimal-transport based formalism to study persistence diagrams and similar topological descriptors has yet to come. In this article, by considering the space of persistence diagrams as a space of discrete measures, and by observing that its metrics can be expressed as optimal partial transport problems, we introduce a generalization of persistence diagrams, namely Radon measures supported on the upper half plane. Such measures naturally appear in topological data analysis when considering continuous representations of persistence diagrams (e.g.\ persistence surfaces) but also as limits for laws of large numbers on persistence diagrams or as expectations of probability distributions on the persistence diagrams space. We explore topological properties of this new space, which will also hold for the closed subspace of persistence diagrams. New results include a characterization of convergence with respect to Wasserstein metrics, a geometric description of barycenters (Fréchet means) for any distribution of diagrams, and an exhaustive description of continuous linear representations of persistence diagrams. We also showcase the strength of this framework to study random persistence diagrams by providing several statistical results made meaningful thanks to this new formalism.
Computational Geometry,Geometric Topology
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to introduce a generalized persistence diagram by viewing persistence diagrams as discrete measures and observing that their metrics can be represented as an optimal partial transport problem. This generalized persistence diagram is a Radon measure on the upper half-plane. These measures naturally appear in topological data analysis, particularly when continuously representing persistence diagrams (such as persistence surfaces), and also as the law of large numbers limit or expectation of probability distributions on persistence diagrams. The main contributions of the paper include: 1. **Study of Topological Properties**: Exploration of the topological properties of this new space, which also apply to persistence diagrams in closed subspaces. 2. **Description of Convergence**: Providing characterizations of convergence under the Wasserstein metric. 3. **Geometric Description**: Describing the geometric characteristics of the Fréchet means of any distribution diagram. 4. **Continuous Linear Representation**: Thoroughly describing the continuous linear representation of persistence diagrams. 5. **Study of Random Persistence Diagrams**: Demonstrating the powerful capability of this framework in studying random persistence diagrams, providing several statistical results that become meaningful due to the new form. Additionally, the paper explores how to address the law of large numbers problem for persistence diagrams generated by random point clouds by extending the Wasserstein distance and investigates the stability of the expected diagram. These works are of significant importance for understanding topological data structures and their statistical properties.