Scalability of Betweenness Approximation Algorithms: an Experimental Review
Sebastian Wandelt,Xing Shi,Xiaoqian Sun
DOI: https://doi.org/10.1109/access.2019.2927681
IF: 3.9
2019-01-01
IEEE Access
Abstract:Betweenness centrality, which measures the contribution of an individual node to the network's connectivity by counting the number of shortest paths a node appears in, is widely used for the analysis of the complex networks. The computation of exact betweenness centrality is prohibitively expensive for large networks, given a worst-case complexity of O(N * E), where N is the number of nodes and E is the number of edges in the network. Accordingly, a multitude of approximation algorithms has been proposed in the literature. Obtaining an overview of the state of the art is difficult, given a combination of numerous algorithms, parameters, and network topologies. In this paper, we report on the results of the probably largest benchmark performed in this field. Specifically, we select 100 networks with distinct topologies and scales, covering various domains. We devise and compare eight selected measures to evaluate the accuracy of the approximation, compared with the exact betweenness computation. All experiments, including those to obtain the exact betweenness values, have been performed on one computer using a single thread, in order to provide a fair comparison. We implemented typical approximation methods and report sensitivity analysis results with a variety of parameters. We find that a uniformly random sampling method, one of the earliest proposed methods in this field, still delivers the best performance, nicely addressing a sweet spot between quality and runtime complexity. In addition, we carried out robustness experiments based on the ranking order of approximated betweenness, in order to show the effect of different approximations on a real-world task. Our study aims at being a reference for choosing a betweenness approximation method, with consideration of network type, the required level of accuracy, and available computational resources.