Abstract:Graph traversal is widely used in map routing, social network analysis, causal discovery and many more applications. Because it is a memory-bound process, graph traversal puts significant pressure on the memory subsystem. Due to poor spatial locality and the increasing size of today’s datasets, graph traversal consumes an ever-larger part of application execution time. One way to mitigate this cost is memory prefetching, which issues requests from the processor to the memory in anticipation of needing certain data. However, traditional prefetching does not work well for graph traversal due to data dependencies, the parallel nature of graphs and the need to move vast amounts of data from memory to the caches. In this paper, we propose a compressed sparse row representation-based graph accelerator on the Hybrid Memory Cube (HMC), called CGAcc. CGAcc combines Compressed Sparse Row (CSR) graph representation with in-memory prefetching and processing to improve the performance of graph traversal. Our approach integrates the prefetching and processing in the logic layer of a 3D stacked Dynamic Random-Access Memory (DRAM) architecture, based on Micron’s HMC. We selected HMC to implement CGAcc because it can provide quite high bandwidth and low access latency. Furthermore, this device has multiple DRAM layers connected to internal logic to control memory access and perform rudimentary computation. Using the CSR representation, CGAcc deploys prefetchers in the HMC to exploit the short transaction latency between the logic and DRAM layers. By doing this, it can also avoid large data movement costs. In the runtime, CGAcc pipelines the prefetching to fetch data from DRAM arrays to improve memory-level parallelism. To further reduce the access latency, several optimized internal caches are also introduced to hold the prefetched data to be Processed In-Memory (PIM). A comprehensive evaluation shows the effectiveness of CGAcc. Experimental results showed that, compared to a conventional HMC main memory equipped with a stream prefetcher, CGAcc achieved an average 3.51× speedup with moderate hardware cost.

DGAP: Efficient Dynamic Graph Analysis on Persistent Memory

EPGraph: an Efficient Graph Computing Model in Persistent Memory System.

An Efficient Data Structure for Dynamic Graph on GPUs

NGraph: Parallel Graph Processing in Hybrid Memory Systems

FTGraph: A Flexible Tree-Based Graph Store on Persistent Memory for Large-Scale Dynamic Graphs

XPGraph: XPline-Friendly Persistent Memory Graph Stores for Large-Scale Evolving Graphs.

A NUMA-aware Graph Database for Hybrid Memory System

An Edge Re-Ordering Based Acceleration Architecture for Improving Data Locality in Graph Analytics Applications

An Efficient ReRAM-based Accelerator for Asynchronous Iterative Graph Processing

GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

PartitionedVC: Partitioned External Memory Graph Analytics Framework for SSDs

Fargraph+: Excavating the Parallelism of Graph Processing Workload on RDMA-based Far Memory System

Excavating the Potential of Graph Workload on RDMA-based Far Memory Architecture

Graphyti: A Semi-External Memory Graph Library for FlashGraph

GraphR: Accelerating Graph Processing Using ReRAM

GEAR: Graph-Evolving Aware Data Arranger to Enhance the Performance of Traversing Evolving Graphs on SCM

Dynamic-ACTS - A Dynamic Graph Analytics Accelerator For HBM-Enabled FPGAs

GraphM

DepGraph: A Dependency-Driven Accelerator for Efficient Iterative Graph Processing

CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube

DiterGraph: Toward I/O-Efficient Incremental Computation over Large Graphs with Billion Edges.