Parallel Algorithm for Discovering Communities in Large-Scale Complex Networks
Shao-Jie QIAO,Jun GUO,Nan HAN,Xiao-Song ZHANG,Chang-An YUAN,Chang-Jie TANG
DOI: https://doi.org/10.11897/SP.J.1016.2017.00687
2017-01-01
Chinese Journal of Computers
Abstract:As the size of networks grows larger, traditional community discovery algorithms cannot effectively and efficiently process the large-scale network data.Based on the Spark distributed graph computing model, this study proposes a parallel algorithm for discovering communities in large-scale complex networks, called DBCS(Discovering Big Community on Spark).The proposed approach employs the basic idea of clustering method beyond modularity, which first calculates the increment of the modularity between the node pairs, and then iteratively finds the maximum modularity increment among all the node pairs.Lastly, it merges the node pairs, and updates the modularity increment of the remaining nodes, in order to identify the communities in large-scale complex networks.Extensive experiments are conducted on several real and synthetic network datasets and the results demonstrate that DBCS can effectively deal with the problem of partitioning the large-scale networks that does not make sense for traditional algorithms.In particular, it only takes about four minutes to handle more than one million nodes for community discovery.In addition, the time cost is reduced to 1/20 of the parallel algorithm based on Hadoop.The accuracy is improved by 7.4% when compared to traditional community discovery algorithms.