Abstract:Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big data" and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified by the recent research interests on PIM-based acceleration. Among them, TESSERACT is a PIM-enabled parallel graph processing architecture based on Micron's Hybrid Memory Cube (HMC), one of the most prominent 3D-stacked memory technologies. It implements a Pregel-like vertex-centric programming model, so that users could develop programs in the familiar interface while taking advantage of PIM. Despite the orders of magnitude speedup compared to DRAM-based systems, TESSERACT generates excessive cross-cube communications through SerDes links, whose bandwidth is much less than the aggregated local bandwidth of HMCs. Our investigation indicates that this is because of the restricted data organization required by the vertex programming model. In this paper, we argue that a PIM-based graph processing system should take data organization as a first-order design consideration. Following this principle, we propose GRAPHP, a novel HMC-based software/hardware co-designed graph processing system that drastically reduces communication and energy consumption compared to TESSERACT. GRAPHP features three key techniques. 1) "Source-cut" partitioning, which fundamentally changes the cross-cube communication from one remote put per cross-cube edge to one update per replica. 2) "Two-phase Vertex Program", a programming model designed for the "source-cut" partitioning with two operations: GenUpdate and ApplyUpdate. 3) Hierarchical communication and overlapping, which further improves performance with unique opportunities offered by the proposed partitioning and programming model. We evaluate GRAPHP using a cycle accurate simulator with 5 real-world graphs and 4 algorithms. The results show that it provides on average 1.7 speedup and 89% energy saving compared to TESSERACT.

An Energy-Efficient In-Memory Accelerator for Graph Construction and Updating

A design framework for processing-in-memory accelerator

DCIM-GCN: Digital Computing-in-Memory Accelerator for Graph Convolutional Network

Accelerating Graph Convolutional Networks Through a PIM-Accelerated Approach

Towards efficient allocation of graph convolutional networks on hybrid computation-in-memory architecture

Fe-GCN: A 3D FeFET Memory Based PIM Accelerator for Graph Convolutional Networks

DyGA: A Hardware-Efficient Accelerator with Traffic-Aware Dynamic Scheduling for Graph Convolutional Networks.

GraphIA: an In-Situ Accelerator for Large-Scale Graph Processing.

Balancing Memory Accesses for Energy-Efficient Graph Analytics Accelerators.

Exploiting Parallelism with Vertex-Clustering in Processing-In-Memory-based GCN Accelerators

GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

GShuttle: Optimizing Memory Access Efficiency for Graph Convolutional Neural Network Accelerators

GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition.

GATe: Streamlining Memory Access and Communication to Accelerate Graph Attention Network With Near-Memory Processing

Cacheap: Portable and Collaborative I/O Optimization for Graph Processing

A Case for In-Memory Random Scatter-Gather for Fast Graph Processing

HyGCN: A GCN Accelerator with Hybrid Architecture

A Task-Adaptive In-Situ ReRAM Computing for Graph Convolutional Networks

PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures

GraphR: Accelerating Graph Processing Using ReRAM