Divide & Conquer: I/O Efficient Depth-First Search
Zhiwei Zhang,Jeffrey Xu Yu,Lu Qin,Zechao Shang
DOI: https://doi.org/10.1145/2723372.2723740
2015-01-01
Abstract:Depth-First Search (DFS), which traverses a graph in the depth-first order, is one of the fundamental graph operations, and the result of DFS over all nodes in G is a spanning tree known as a D ES-Tree. There are many graph algorithms that need DFS such as connected component computation, topological sort, community detection, eulerian path computation, graph bipartiteness testing, planar graph testing, etc, because the in-memory DFS algorithm shows it can be done in linear time w.r.t. the size of G. However, given the fact that real-world graphs grow rapidly in the big data era, the in-memory DFS algorithm cannot be used to handle a large graph that cannot be entirely held in main memory. In this paper, we focus on I/O efficiency and study semi-external algorithms to DFS a graph G which is on disk. Here, like the existing semi-external algorithms, we assume that a spanning tree of G can be held in main memory and the remaining edges of G are kept on disk, and compute the DFS-Tree in main memory with which DFS can be identified. We propose novel divide & conquer algorithms to DFS over a graph G on disk. In brief, we divide a graph into several subgraphs, compute the DFS-Tree for each subgraph independently, and then merge them together to compute the DES-Tree for the whole graph. With the global DES-Tree computed we identify DFS. We discuss the valid division, that can lead to the correct DFS, and the challenges to do so. We propose two division algorithms, named Divide-Star and Divide-TD, and a merge algorithm. We conduct extensive experimental studies using four real massive datasets and several synthetic datasets to confirm the I/O efficiency of our approach.