Abstract:Recently, Graph Convolutional Networks (GCNs) have shown powerful learning capabilities in graph processing tasks. Computing GCNs with conventional von Neumann architectures usually suffers from limited memory bandwidth due to the irregular memory access. Recent work has proposed Processing-In-Memory (PIM) architectures to overcome the bandwidth bottleneck in Convolutional Neural Networks (CNNs) by performing in-situ matrix-vector multiplication. However, the performance improvement and computation parallelism of existing CNN-oriented PIM architectures is hindered when performing GCNs because of the large scale and sparsity of graphs. To tackle these problems, this paper presents a parallelism enhancement framework for PIM-based GCN architectures. At the software level, we propose a fixed-point quantization method for GCNs, which reduces the PIM computation overhead with little accuracy loss. We also introduce the vertex clustering algorithm to the graph, minimizing the inter-cluster links and realizing cluster-level parallel computing on multi-core systems. At the hardware level, we design a Resistive Random Access Memory (RRAM) based multi-core PIM architecture for GCN, which supports the cluster-level parallelism. Besides, we propose a coarse-grained pipeline dataflow to cover the RRAM write costs and improve the GCN computation throughput. At the software/hardware interface level, we propose a PIM-aware GCN mapping strategy to achieve the optimal tradeoff between resource utilization and computation performance. We also propose edge dropping methods to reduce the inter-core communications with little accuracy loss. We evaluate our framework on typical datasets with multiple widely-used GCN models. Experimental results show that the proposed framework achieves $698\times, 89\times$ , and $41\times$ speedup with $7108\times,255\times$ , and $31\times$ energy efficiency enhancement compared with CPUs, GPUs, and ASICs, respectively.

PASGCN: An ReRAM-Based PIM Design for GCN With Adaptively Sparsified Graphs

Exploiting Parallelism with Vertex-Clustering in Processing-In-Memory-based GCN Accelerators

Accelerating Graph Convolutional Networks Through a PIM-Accelerated Approach

A design framework for processing-in-memory accelerator

DCIM-GCN: Digital Computing-in-Memory Accelerator for Graph Convolutional Network

Fe-GCN: A 3D FeFET Memory Based PIM Accelerator for Graph Convolutional Networks

A Task-Adaptive In-Situ ReRAM Computing for Graph Convolutional Networks

PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures

PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware Computation

RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN

GNN-PIM: A Processing-in-Memory Architecture for Graph Neural Networks

SEAL-lab Technical Report – No . 2015-001 ( April 29 , 2016 ) Processing-in-Memory in ReRAM-based Main Memory

ReHy: A ReRAM-based Digital/Analog Hybrid PIM Architecture for Accelerating CNN Training

SEAL-lab Technical Report – No . 2015-001 ( November 30 , 2015 ) Processing-in-Memory in ReRAM-based Main Memory

TensorCache: Reconstructing Memory Architecture with SRAM-Based In-Cache Computing for Efficient Tensor Computations in GPGPUs

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Generalized Ping-Pong: Off-Chip Memory Bandwidth Centric Pipelining Strategy for Processing-In-Memory Accelerators

VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations