Finding Sccs in Real-World Graphs on External Memory: A Task-Based Approach

Huiming Lv,Zhiyuan Shao,Lang Li,Xuanhua Shi,Hai Jin
DOI: https://doi.org/10.1109/ispdc.2016.15
2017-01-01
Concurrency and Computation Practice and Experience
Abstract:Finding Strongly Connected Components (SCCs) in graphs is one of the important research topics of graph data mining. Traditional methods of finding SCCs need to fully load the whole graph into the main memory of a computer before actual processing. However, with the rapid growth of real-world graphs, the sizes of graphs easily exceed the main memory space of an ordinary computer. The distributed graph processing system running on a cluster and the out-of-core system utilizing the external memory all can handle that huge graph, but recent evidences (e.g., GridGraph) show that the external memory systems are more cost-effective and efficient than the distributed systems on conducting most graph mining tasks.Existing external memory solutions are inefficient on finding SCCs in large-scale graphs for two reasons: 1) The data-parallel processing model adopted is not efficient to find SCCs in a large-scale graph. 2) Their poor support for graph mutation incurs excessively high overhead. In this paper, we study the problem of finding SCCs in big real-world graphs by using the external memory. We propose a task-based approach and an efficient graph mutation method to address the limitations in existing external solutions for finding SCCs. Experiment results show that our approach is orders of magnitude faster than existing external memory solutions.
What problem does this paper attempt to address?