Parallelizing Maximal Clique Enumeration Over Graph Data.
Qun Chen,Chao Fang,Zhuo Wang,Bo Suo,Zhanhuai Li,Zachary G. Ives
DOI: https://doi.org/10.1007/978-3-319-32049-6_16
2016-01-01
Abstract:In a wide variety of emerging data-intensive applications, such as social network analysis, Web document clustering, entity resolution, and detection of consistently co-expressed genes in systems biology, the detection of dense subgraphs cliques is an essential component. Unfortunately, this problem is NP-Complete and thus computationally intensive at scale -- hence there is a need to come up with techniques for distributing the computation across multiple machines such that the computation, which is too time-consuming on a single machine, can be efficiently performed on a machine cluster given that it is large enough. In this paper, we first propose a new approach for maximal clique enumeration, which identifies cliques by recursive graph partitioning. Given a connected graph $$G=V,E$$G=V,E, it has a space complexity of O|E| and a time complexity of $$O|E|\\mu G$$O|E|μG, where $$\\mu G$$μG represents the number of different cliques existing in G. It recursively divides a graph until each task is sufficiently small to be processed in parallel. We then develop parallel solutions and demonstrate how graph partitioning can enable effective load balancing. Finally, we evaluate the performance of the proposed approach on real and synthetic graph data and show that it performs considerably better than existing approaches in both centralized and parallel settings. Our parallel algorithms are implemented and evaluated on MapReduce, a popular shared-nothing parallel framework, but can easily generalize to other shared-nothing or shared-memory parallel frameworks.