Abstract:In a wide variety of emerging data-intensive applications, such as social network analysis, Web document clustering, entity resolution, and detection of consistently co-expressed genes in systems biology, the detection of dense subgraphs cliques and k-plex is an essential component. Unfortunately, these problems are NP-Complete and thus computationally intensive at scale hence there is a need to come up with techniques for distributing the computation across multiple machines such that the computation, which is too time-consuming on a single machine, can be efficiently performed on a machine cluster given that it is large enough.In this paper, we first propose a new approach for maximal clique and k-plex enumeration, which identifies dense subgraphs by binary graph partitioning. Given a connected graph G = (V, E), it has a space complexity of O(|E|) and a time complexity of O(|E|mu(G)), where mu(G) represents the number of different cliques (k-plexes) existing in G. It recursively divides a graph until each task is sufficiently small to be processed in parallel. We then develop parallel solutions and demonstrate how graph partitioning can enable effective load balancing. Finally, we evaluate the performance of the proposed approach on real and synthetic graph data and show that it performs considerably better than existing approaches in both centralized and parallel settings. In the parallel setting, it can achieve the speedups of up to 10x over existing approaches on large graphs. Our parallel algorithms are primarily implemented and evaluated on MapReduce, a popular shared-nothing parallel framework, but can easily generalize to other shared nothing or shared-memory parallel frameworks. The work presented in this paper is an extension of our preliminary work on the approach of binary graph partitioning for maximal clique enumeration. In this work, we extend the proposed approach to handle maximal k-plex detection as well. (C) 2017 Elsevier Inc. All rights reserved.

Convex optimization for the planted k-disjoint-clique problem

Convex Formulation for Planted Quasi-Clique Recovery

Parallelizing Maximal Clique and K-Plex Enumeration over Graph Data

The Landscape of the Planted Clique Problem: Dense subgraphs and the Overlap Gap Property

Clustering Partially Observed Graphs via Convex Optimization

Planted Models for the Densest $k$-Subgraph Problem

Relaxed Graph Color Bound for the Maximum k-plex Problem

Dominating Set, Independent Set, Discrete $k$-Center, Dispersion, and Related Problems for Planar Points in Convex Position

Exact recovery of Planted Cliques in Semi-random graphs

Efficient Maximum k-Defective Clique Computation with Improved Time Complexity

Theoretically and Practically Efficient Maximum Defective Clique Search

KD-Club: an Efficient Exact Algorithm with New Coloring-based Upper Bound for the Maximum K-Defective Clique Problem

Homothetic Polygons and Beyond: Intersection Graphs, Recognition, and Maximum Clique

A Fast Algorithm to Compute Maximum K-Plexes in Social Network Analysis

Convex Hulls, Triangulations, and Voronoi Diagrams of Planar Point Sets on the Congested Clique

How to Hide a Clique?

Efficient Enumeration of Large Maximal k-Plexes

Scaling Up K-Clique Densest Subgraph Detection.

A Faster Branching Algorithm for the Maximum $k$-Defective Clique Problem

Scalable $k$-clique Densest Subgraph Search

A Fast Algorithm to Compute Maximum &Lt;i>k</i>-Plexes in Social Network Analysis