Faster Parallel Triangular Maximally Filtered Graphs and Hierarchical Clustering

Steven Raphael,Julian Shun
2024-08-18
Abstract:Filtered graphs provide a powerful tool for data clustering. The triangular maximally filtered graph (TMFG) method, when combined with the directed bubble hierarchy tree (DBHT) method, defines a useful algorithm for hierarchical data clustering. This combined TMFG-DBHT algorithm has been shown to produce clusters with good accuracy for time series data, but the previous state-of-the-art parallel algorithm has limited parallelism. This paper presents an improved parallel algorithm for TMFG-DBHT. Our algorithm increases the amount of parallelism by aggregating the bulk of the work of TMFG construction together to reduce the overheads of parallelism. Furthermore, our TMFG algorithm updates information lazily, which reduces the overall work. We find further speedups by computing all-pairs shortest paths approximately instead of exactly in DBHT. We show experimentally that our algorithm gives a 3.7--10.7x speedup over the previous state-of-the-art TMFG-DBHT implementation, while preserving clustering accuracy.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?