k-means clustering for persistent homology

Yueqi Cao,Prudence Leung,Anthea Monod
DOI: https://doi.org/10.1007/s11634-023-00578-y
2024-02-01
Advances in Data Analysis and Classification
Abstract:Persistent homology is a methodology central to topological data analysis that extracts and summarizes the topological features within a dataset as a persistence diagram. It has recently gained much popularity from its myriad successful applications to many domains, however, its algebraic construction induces a metric space of persistence diagrams with a highly complex geometry. In this paper, we prove convergence of the k -means clustering algorithm on persistence diagram space and establish theoretical properties of the solution to the optimization problem in the Karush–Kuhn–Tucker framework. Additionally, we perform numerical experiments on both simulated and real data of various representations of persistent homology, including embeddings of persistence diagrams as well as diagrams themselves and their generalizations as persistence measures. We find that k -means clustering performance directly on persistence diagrams and measures outperform their vectorized representations.
statistics & probability
What problem does this paper attempt to address?