A Fast Parallel Community Discovery Model on Complex Networks Through Approximate Optimization
Shaojie Qiao,Nan Han,Yunjun Gao,Rong-Hua Li,Jianbin Huang,Jun Guo,Louis Alberto Gutierrez,Xindong Wu
DOI: https://doi.org/10.1109/tkde.2018.2803818
IF: 9.235
2018-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Community discovery plays an essential role in the analysis of the structural features of complex networks. Since online networks grow increasingly large and complex over time, the methods traditionally used for community discovery cannot efficiently handle large-scale network data. This introduces the important problem of how to effectively and efficiently discover large communities from complex networks. In this study, we propose a fast parallel community discovery model called picaso (a parallel community discovery a lgorithm based on approximate optimization), which integrates two new techniques: (1) Mountain model, which works by utilizing graph theory to approximate the selection of nodes needed for merging, and (2) Landslide algorithm, which is used to update the modularity increment based on the approximated optimization. In addition, the GraphX distribution computing framework is employed in order to achieve parallel community detection over complex networks. In the proposed model, clustering on modularity is used to initialize the Mountain model as well as to compute the weight of each edge in the networks. The relationships among the communities are then simplified by applying the Landslide algorithm, which allows us to obtain the community structures of the complex networks. Extensive experiments were conducted on real and synthetic complex network datasets, and the results demonstrate that the proposed algorithm can outperform the state of the art methods, in effectiveness and efficiency, when working to solve the problem of community detection. Moreover, we demonstratively prove that overall time performance approximates to four times faster than similar approaches. Effectively our results suggest a new paradigm for large-scale community discovery of complex networks.