Abstract:Recently, Graph Convolutional Networks (GCNs) have shown powerful learning capabilities in graph processing tasks. Computing GCNs with conventional von Neumann architectures usually suffers from limited memory bandwidth due to the irregular memory access. Recent work has proposed Processing-In-Memory (PIM) architectures to overcome the bandwidth bottleneck in Convolutional Neural Networks (CNNs) by performing in-situ matrix-vector multiplication. However, the performance improvement and computation parallelism of existing CNN-oriented PIM architectures is hindered when performing GCNs because of the large scale and sparsity of graphs. To tackle these problems, this paper presents a parallelism enhancement framework for PIM-based GCN architectures. At the software level, we propose a fixed-point quantization method for GCNs, which reduces the PIM computation overhead with little accuracy loss. We also introduce the vertex clustering algorithm to the graph, minimizing the inter-cluster links and realizing cluster-level parallel computing on multi-core systems. At the hardware level, we design a Resistive Random Access Memory (RRAM) based multi-core PIM architecture for GCN, which supports the cluster-level parallelism. Besides, we propose a coarse-grained pipeline dataflow to cover the RRAM write costs and improve the GCN computation throughput. At the software/hardware interface level, we propose a PIM-aware GCN mapping strategy to achieve the optimal tradeoff between resource utilization and computation performance. We also propose edge dropping methods to reduce the inter-core communications with little accuracy loss. We evaluate our framework on typical datasets with multiple widely-used GCN models. Experimental results show that the proposed framework achieves $698\times, 89\times$ , and $41\times$ speedup with $7108\times,255\times$ , and $31\times$ energy efficiency enhancement compared with CPUs, GPUs, and ASICs, respectively.

GATe: Streamlining Memory Access and Communication to Accelerate Graph Attention Network With Near-Memory Processing

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing

MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization

NTGAT: A Graph Attention Network Accelerator with Runtime Node Tailoring

NDPGNN: A Near-Data Processing Architecture for GNN Training and Inference Acceleration

Accelerating Graph Convolutional Networks Through a PIM-Accelerated Approach

SH-GAT: Software-hardware co-design for accelerating graph attention networks on FPGA

GNN-PIM: A Processing-in-Memory Architecture for Graph Neural Networks

NEM-GNN - DAC/ADC-less, scalable, reconfigurable, graph and sparsity-aware near-memory accelerator for graph neural networks

Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective

Exploiting Parallelism with Vertex-Clustering in Processing-In-Memory-based GCN Accelerators

EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks

GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing

Fe-GCN: A 3D FeFET Memory Based PIM Accelerator for Graph Convolutional Networks

DyGA: A Hardware-Efficient Accelerator with Traffic-Aware Dynamic Scheduling for Graph Convolutional Networks.

FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

An Energy-Efficient In-Memory Accelerator for Graph Construction and Updating