Fast and Memory-Efficient Approximate Minimum Spanning Tree Generation for Large Datasets

Mahmood K. M. Almansoori,Andras Meszaros,Miklos Telek
DOI: https://doi.org/10.1007/s13369-024-08974-y
IF: 2.807
2024-06-22
Arabian Journal for Science and Engineering
Abstract:Conventional minimum spanning tree (MST) algorithms typically start by creating a distance matrix of the pairs of data points, leading to a time complexity of . This initial step poses a computational bottleneck. To overcome this limitation, we present a novel method that constructs an initial random k -neighbor graph and optimizes this graph by employing a crawling technique to efficiently approximate the k Nearest Neighbors ( k NN) graph. This crawling approach allows us to approximate the closest neighbors of each node. Subsequently, the approximate k NN graph is utilized to build an initial approximate MST and iteratively refine it by the same crawling process. Using this approach, an approximate MST can be obtained for a data set of size n with empirical cost around and a minimal O ( n ) memory consumption. In contrast to spatial tree-based approaches, the presented method also scales well to high dimensional data. We have shown that the proposed approach achieves such a level of performance with only a marginal accuracy reduction between 0.5% and 6%. These qualities make it an excellent candidate for approximate MST calculation for high-dimensional, large data sets.
multidisciplinary sciences
What problem does this paper attempt to address?