Parallelizing Maximal Clique and K-Plex Enumeration over Graph Data
Zhuo Wang,Qun Chen,Boyi Hou,Bo Suo,Zhanhuai Li,Wei Pan,Zachary G. Ives
DOI: https://doi.org/10.1016/j.jpdc.2017.03.003
IF: 4.542
2017-01-01
Journal of Parallel and Distributed Computing
Abstract:In a wide variety of emerging data-intensive applications, such as social network analysis, Web document clustering, entity resolution, and detection of consistently co-expressed genes in systems biology, the detection of dense subgraphs cliques and k-plex is an essential component. Unfortunately, these problems are NP-Complete and thus computationally intensive at scale hence there is a need to come up with techniques for distributing the computation across multiple machines such that the computation, which is too time-consuming on a single machine, can be efficiently performed on a machine cluster given that it is large enough.In this paper, we first propose a new approach for maximal clique and k-plex enumeration, which identifies dense subgraphs by binary graph partitioning. Given a connected graph G = (V, E), it has a space complexity of O(|E|) and a time complexity of O(|E|mu(G)), where mu(G) represents the number of different cliques (k-plexes) existing in G. It recursively divides a graph until each task is sufficiently small to be processed in parallel. We then develop parallel solutions and demonstrate how graph partitioning can enable effective load balancing. Finally, we evaluate the performance of the proposed approach on real and synthetic graph data and show that it performs considerably better than existing approaches in both centralized and parallel settings. In the parallel setting, it can achieve the speedups of up to 10x over existing approaches on large graphs. Our parallel algorithms are primarily implemented and evaluated on MapReduce, a popular shared-nothing parallel framework, but can easily generalize to other shared nothing or shared-memory parallel frameworks. The work presented in this paper is an extension of our preliminary work on the approach of binary graph partitioning for maximal clique enumeration. In this work, we extend the proposed approach to handle maximal k-plex detection as well. (C) 2017 Elsevier Inc. All rights reserved.