Abstract:strongly connected component ( 𝖲𝖢𝖢 ) is a maximal subgraph of a directed graph G in which every pair of nodes is reachable from each other in the 𝖲𝖢𝖢 . With such a property, a general directed graph can be represented by a directed acyclic graph ( DAG ) by contracting every 𝖲𝖢𝖢 of G to a node in DAG . In many real applications that need graph pattern matching, topological sorting, or reachability query processing, the best way to deal with a general directed graph is to deal with its DAG representation. Therefore, finding all 𝖲𝖢𝖢 s in a directed graph G is a critical operation. The existing in-memory algorithms based on depth first search ( DFS ) can find all 𝖲𝖢𝖢 s in linear time with respect to the size of a graph. However, when a graph cannot reside entirely in the main memory, the existing external or semi-external algorithms to find all 𝖲𝖢𝖢 s have limitation to achieve high I/O efficiency. In this paper, we study new I/O-efficient semi-external algorithms to find all 𝖲𝖢𝖢 s for a massive directed graph G that cannot reside in main memory entirely. To overcome the deficiency of the existing DFS -based semi-external algorithm that heavily relies on a total order, we explore a weak order based on which we investigate new algorithms. We propose a new two-phase algorithm, namely, tree construction and tree search. In the tree construction phase, a spanning tree of G can be constructed in bounded number of sequential scans of G . In the tree search phase, it needs to sequentially scan the graph once to find all 𝖲𝖢𝖢 s. In addition, we propose a new single-phase algorithm, which combines the tree construction and tree search phases into a single phase, with three new optimization techniques. They are early acceptance, early rejection, and batch processing. By the single-phase algorithm with the new optimization techniques, we can significantly reduce the number of I/Os and the CPU cost. We prove the correctness of the algorithms. We conduct extensive experimental studies using 4 real datasets including a massive real dataset and several synthetic datasets to confirm the I/O efficiency of our approaches.

Finding Sccs in Real-World Graphs on External Memory: A Task-Based Approach

Efficient Semi-External SCC Computation

Design and Implementation of External Storage Large-Scale Graph Computing System.

Survey of External Memory Large-Scale Graph Processing on a Multi-Core System

Clustering Large Undirected Graphs on External Memory

Contract & Expand: I/O Efficient SCCs Computing

I/O efficient: computing SCCs in massive graphs

A Comprehensive Reconfigurable Computing Approach to Memory Wall Problem of Large Graph Computation

Parallel Construction of Multiresolution Representation for Massive Meshes Based on External Memory Octree

Efficient Disk-Based Directed Graph Processing: A Strongly Connected Component Approach

A Distributed Graph Data Storage and Computing Framework

Efficient Subgraph Matching on Billion Node Graphs

Unified-memory-based hybrid processing for partition-oriented subgraph matching on GPU

Cmfsm: a Scalable CPU-MIC Coordinated Drug-Finding Tool by Frequent Subgraph Mining

Efficient Large Graph Processing with Chunk-Based Graph Representation Model.

A Memory Efficient Maximal Clique Enumeration Method for Sparse Graphs with a Parallel Implementation

Memory-Efficient Community Detection on Large Graphs Using Weighted Sketches

Autonomous and Ubiquitous In-node Learning Algorithms of Active Directed Graphs and Its Storage Behavior

Efficient Semi-External SCC Computation (extended Abstract)

Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

GraphSD: A State and Dependency Aware Out-of-Core Graph Processing System.