Efficient Maximal Clique Enumeration Over Graph Data

Boyi Hou,Zhuo Wang,Qun Chen,Bo Suo,Chao Fang,Zhanhuai Li,Zachary G. Ives
DOI: https://doi.org/10.1007/s41019-017-0033-5
2017-01-01
Data Science and Engineering
Abstract:In a wide variety of emerging data-intensive applications, such as social network analysis, Web document clustering, entity resolution, and detection of consistently co-expressed genes in systems biology, the detection of dense subgraphs (cliques) is an essential component. Unfortunately, this problem is NP-Complete and thus computationally intensive at scale—hence there is a need for efficient processing, as well as the techniques for distributing the computation across multiple machines such that the computation, which is too time-consuming on a single machine, can be efficiently performed on a machine cluster given that it is large enough. In this paper, we propose a new algorithm (called GP) for maximal clique enumeration. It identifies cliques by the operation of binary graph partitioning, which iteratively divides a graph until each task is sufficiently small to be processed in parallel. Given a connected graph G=(V,E) , the GP algorithm has a space complexity of O (| E |) and a time complexity of O(|E|μ (G)) , where μ (G) represents the number of different cliques existing in G . We also present a hybrid algorithm, which can effectively leverage the advantages of both the GP algorithm and the classical Bron-and-Kerbosch (BK) algorithm. Then, we develop corresponding parallel solutions based on the GP and hybrid algorithms. Finally, we evaluate the performance of the proposed solutions on real and synthetic graph data. Our extensive experiments show that in both centralized and parallel setting, our proposed GP and hybrid approaches achieve considerably better performance than the state-of-the-art BK approach. Our parallel solutions are implemented and evaluated on MapReduce, a popular shared-nothing parallel framework, but can easily generalize to other shared-nothing or shared-memory parallel frameworks.
What problem does this paper attempt to address?