Design and Implementation of External Storage Large-Scale Graph Computing System.

Lingbin Liu,Jianqiang Huang,Dongqiang Huang,Haodong Bian,Xiaoying Wang
DOI: https://doi.org/10.1145/3606043.3606085
2023-01-01
Abstract:With the rise of big data, graph computing has become prevalent in many fields. To effectively address such problems, large-scale graph computing systems have emerged. Most existing systems adopt memory-based computing frameworks. However, the rapid growth of data scale often leads to insufficient memory storage. To tackle these issues, an external storage-based graph computing system called DCGraph has been designed and implemented. The bottleneck of large-scale graph computing systems based on single-machine external storage usually lies in external storage I/O. Hence, DCGraph has been optimized for external storage I/O. In the preprocessing stage, graph data is compressed and transformed into a two-dimensional CSC format. Each block records the offset of its data. This guarantees that data of the same target vertex is continuously stored in external storage within the same block. DCSC format is selected for compressing data blocks with many vertices of zero in-degree. Additionally, a jump-out calculation mode suitable for the CSC format is designed for specific graph calculation algorithms. During data reading, it is determined whether to skip the current record based on its subscript and offset. Selective scheduling is used to skip inactive data blocks during each layer iteration to reduce unnecessary data reading and processing. This approach reduces both the number and total amount of I/Os. Experimental results show that compared to GridGraph, DCGraph achieves a speedup ratio of 1.5 to 2.8 times while reducing the total amount of I/Os in calculation to less than half of that of GridGraph. CCS CONCEPTS
What problem does this paper attempt to address?