GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing

Hongfu Li
2024-07-18
Abstract:An efficient data structure is fundamental to meeting the growing demands in dynamic graph processing. However, the dual requirements for graph computation efficiency (with contiguous structures) and graph update efficiency (with linked list-like structures) present a conflict in the design principles of graph structures. After experimental studies of existing state-of-the-art dynamic graph structures, we observe that the overhead of cache misses accounts for a major portion of the graph computation time. This paper presents GastCoCo, a system with graph storage and coroutine-based prefetch co-design. By employing software prefetching via stackless coroutines and introducing a prefetch-friendly data structure CBList, GastCoCo significantly alleviates the performance degradation caused by cache misses. Our results show that GastCoCo outperforms state-of-the-art graph storage systems by 1.3x - 180x in graph updates and 1.4x - 41.1x in graph computation.
Databases
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve two main challenges in dynamic graph processing: the contradiction between **graph computing efficiency** and **graph update efficiency**. Specifically: 1. **Graph computing efficiency**: Efficient graph computing requires that the data structure has continuity in memory so as to make full use of the cache mechanism and reduce the performance loss caused by cache misses. 2. **Graph update efficiency**: Frequent graph update operations (such as inserting and deleting edges) require that the data structure has the characteristics of a linked list or a linked - list - like structure so that insertion and deletion operations can be carried out quickly without a large amount of data movement. Existing graph storage structures are often designed to be biased towards one aspect in their design, resulting in poor performance in the other aspect. For example: - **Adjacency List (AL)**: It is suitable for graph updates, but has poor graph computing performance. - **Compressed Sparse Row (CSR)**: It is suitable for graph computing, but has extremely poor graph update performance. To solve this contradiction, the paper proposes a new system **GastCoCo**, which significantly reduces the performance degradation caused by cache misses through the co - design of graph storage and coroutine pre - fetching. Specific measures include: - **Hardware pre - fetching - aware data structure (CBList)**: A new data structure is designed, which supports efficient graph computing and is also convenient for frequent graph updates. - **Software pre - fetching technology**: Stackless coroutines are used to implement software pre - fetching, further optimizing the performance of graph computing and graph updating. Through these innovations, GastCoCo improves the performance by 1.3 to 180 times and 1.4 to 41.1 times respectively in graph update and graph computing tasks compared with the existing state - of - the - art graph storage systems.