DIMMining: Pruning-Efficient and Parallel Graph Mining on Near-Memory-Computing

Guohao Dai,Zhenhua Zhu,Tianyu Fu,Chiyue Wei,Bangyan Wang,Xiangyu Li,Yuan Xie,Huazhong Yang,Yu Wang
DOI: https://doi.org/10.1145/3470496.3527388
2022-01-01
Abstract:Graph mining, which finds specific patterns in the graph, is becoming increasingly important in various domains. We point out that accelerating graph mining suffers from the following challenges: (1) Heavy comparison for pruning: Pruning technique is widely used to reduce search space in graph mining. It applies constraints on vertex indices and involves massive index comparisons. (2) Low parallelism of set operations: The typical graph mining algorithms can be expressed as a series of set operations between neighbors of vertices, which suffer from low parallelism if vertices are streaming to the computation units. (3) Heavy data transfer: Graph mining needs to transfer intermediate data with two orders of magnitude larger than the original data volume between CPU and memory. To tackle these challenges, we propose DIMMining with four techniques from algorithm to architecture perspectives. The Index Pre-comparison scheme is proposed for efficient pruning. We introduce the self anchor and neighbor partition to enable pre-comparison for vertex indices. Thus, we can reduce comparisons during runtime. We propose a Flexible BCSR (Bitmap with Compressed Sparse Row) format to enable parallelism for set operations from the data structure perspective, which works on continuous vertices without memory space overheads. The Systolic Merge Array is designed to further explore the parallelism on discontinuous vertices from the architecture perspective. Then, we propose a DIMM-based Near-Memory-Computing architecture, which eliminates the large-volume data transfer between the computation and the memory. Extensive experimental results on real-world graphs show that DIMMining achieves 222.23x and 139.51x speedup compared with FPGAs and CPUs, and 3.61x speedup over the state-of-the-art graph mining architecture.
What problem does this paper attempt to address?