ReaDy: A ReRAM-Based Processing-in-Memory Accelerator for Dynamic Graph Convolutional Networks.

Yu Huang,Long Zheng,Pengcheng Yao,Qinggang Wang,Haifeng Liu,Xiaofei Liao,Hai Jin,Jingling Xue
DOI: https://doi.org/10.1109/tcad.2022.3199152
IF: 2.9
2022-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Dynamic graph convolutional networks (DGCNs) have emerged as an effective approach to analyzing graph data that is constantly changing. The typical DGCNs incorporate not only graph convolutional networks (GCNs) to extract the structural information but also with recurrent neural networks (RNNs) to capture the temporal information from evolving graph data. These two alternative execution kernels of DGCNs impose unique architecture challenges for both types of kernels to be implemented efficiently. The presence of complex execution patterns of DGCNs renders existing architectures unsuitable. In this article, we present the first DGCN accelerator with an integrated architecture, named ReaDy, to accelerate DGCNs based on emerging PIM-featured ReRAM architectures. ReaDy is novel with an integrated architecture that enables running the GCN and RNN kernels of DGCNs simultaneously. Specifically, ReaDy is equipped with a redundancy-free scheduling mechanism to alleviate intrinsic dynamic irregularity for the GCN kernel, improving hardware utilization. In addition, ReaDy also includes a locality-aware dataflow strategy to exploit the inherent intervertex data locality for the RNN kernel, reducing superfluous data accesses to vertices and weight parameters. In a holistic view, ReaDy further enhances the entire system via an interkernel pipeline to reduce the off-chip accesses of intermediate results, boosting the overall efficiency of DGCNs significantly. Compared to the state-of-the-art software framework, PyGT, running on Intel Xeon E5-2680v4 CPU and NVIDIA Ampere A100 GPU, ReaDy achieves the average speedups of $955\times $ and $27.33\times $ , and the average energy savings of 1 $093\times $ and $80.21\times $ , respectively. In addition, ReaDy outperforms ReFlip-ERA, which is obtained by combining a state-of-the-art GCN accelerator ReFlip and RNN accelerator ERA-LSTM, by an average speedup of $8.30\times $ and an average energy saving of $7.29\times $ .
What problem does this paper attempt to address?