Abstract:In a wide variety of emerging data-intensive applications, such as social network analysis, Web document clustering, entity resolution, and detection of consistently co-expressed genes in systems biology, the detection of dense subgraphs cliques and k-plex is an essential component. Unfortunately, these problems are NP-Complete and thus computationally intensive at scale hence there is a need to come up with techniques for distributing the computation across multiple machines such that the computation, which is too time-consuming on a single machine, can be efficiently performed on a machine cluster given that it is large enough.In this paper, we first propose a new approach for maximal clique and k-plex enumeration, which identifies dense subgraphs by binary graph partitioning. Given a connected graph G = (V, E), it has a space complexity of O(|E|) and a time complexity of O(|E|mu(G)), where mu(G) represents the number of different cliques (k-plexes) existing in G. It recursively divides a graph until each task is sufficiently small to be processed in parallel. We then develop parallel solutions and demonstrate how graph partitioning can enable effective load balancing. Finally, we evaluate the performance of the proposed approach on real and synthetic graph data and show that it performs considerably better than existing approaches in both centralized and parallel settings. In the parallel setting, it can achieve the speedups of up to 10x over existing approaches on large graphs. Our parallel algorithms are primarily implemented and evaluated on MapReduce, a popular shared-nothing parallel framework, but can easily generalize to other shared nothing or shared-memory parallel frameworks. The work presented in this paper is an extension of our preliminary work on the approach of binary graph partitioning for maximal clique enumeration. In this work, we extend the proposed approach to handle maximal k-plex detection as well. (C) 2017 Elsevier Inc. All rights reserved.

GraphCube: Interconnection Hierarchy-aware Graph Processing.

Graphine: Programming Graph-Parallel Computation of Large Natural Graphs on Multicore Cluster

GraphCP: An I/O-Efficient Concurrent Graph Processing Framework

Efficient graph computation on hybrid CPU and GPU systems

CGgraph: an Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processor

A Distributed Graph-Parallel Computing System with Lightweight Communication Overhead

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning

Parallelizing Maximal Clique and K-Plex Enumeration over Graph Data

3-D Partitioning for Large-Scale Graph Processing.

Exploring the Hidden Dimension in Graph Processing.

Graph3S: A Simple, Speedy and Scalable Distributed Graph Processing System

Efficient Processing of Very Large Graphs in a Small Cluster

Graph for Science: From API based Programming to Graph Engine based Programming for HPC

Parallelizing Clique and Quasi-Clique Detection over Graph Data

Gram: Scaling Graph Computation To The Trillions

GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

Gemini: A Computation-Centric Distributed Graph Processing System

A disk I/O optimized system for concurrent graph processing jobs

GraphA: Efficient Partitioning and Storage for Distributed Graph Computation

Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

ScalaGraph: A Scalable Accelerator for Massively Parallel Graph Processing