Abstract:Large-scale graph processing poses challenges due to its size and irregular memory access patterns, causing performance degradation in common architectures, such as CPUs and GPUs. Recent research includes accelerating graph processing using Field Programmable Gate Arrays (FPGAs). FPGAs can provide very efficient acceleration thanks to reconfigurable on-chip resources. Although limited, these resources offer a larger design space than CPUs and GPUs. We propose an approach in which data are preprocessed in small chunks with an optimized graph partitioning technique for execution on FPGA accelerators. The chunks, located on the host, are streamed directly into a customized memory layer implemented in the FPGA, which is tightly coupled with the processing elements responsible for the graph algorithm execution. This improves application memory access latency, which is crucial in large-sale graph computing performance. This work presents a hardware design that, combined with graph partitioning, enables us to achieve high-performance and potentially scalable handling of large graphs (i.e., graphs with millions of vertices and billions of edges in current scenarios) while using popular graph algorithms. The proposed framework accelerates performance 56 times compared with CPU (multicore with 16 logical cores in our reference experiments), 2.5 times and 4 times faster compared to state-of-the-art FPGA and GPU solutions (FPGA has 15 compute units, and GPU reference has 128 streaming-multiprocessors in our experiments), respectively, when using the PageRank algorithm. For the Single-Source-Shortest-Past (SSSP) algorithm, we achieve speedups of up to 65x, 26x, and 18x compared to CPU, GPU, and FPGA works, respectively. Lastly, in the context of the Weakly Connected Component (WCC) algorithm, our framework achieves a speedup of up to 403 times compared to the CPU, 7.4x against the GPU, and it is faster than the FPGA alternatives up to 10.3x.

A Data-Centric Accelerator for High-Performance Hypergraph Processing

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing

DyGA: A Hardware-Efficient Accelerator with Traffic-Aware Dynamic Scheduling for Graph Convolutional Networks.

OmniGraph: A Scalable Hardware Accelerator for Graph Processing

HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management

Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform

HyperX: A Scalable Hypergraph Framework

An optimized architecture for accelerating graph computing on FPGAs

Ph.D. Project: Optimizing the Data Traffic for Large Graph Processing on FPGA Via a Stateful Approach

SoGraph: A State-Aware Architecture for Out-of-Memory Graph Processing on HBM-Equipped FPGAs

A Scalable Hybrid Architecture for High Performance Data-Parallel Applications

Balancing Memory Accesses for Energy-Efficient Graph Analytics Accelerators.

An Efficient Dispatcher for Large Scale GraphProcessing on OpenCL-based FPGAs

Flip: Data-Centric Edge CGRA Accelerator

Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective

Response Time Analysis of Parallel Tasks on Accelerator-Based Heterogeneous Platforms

Integrating FPGA-based hardware acceleration with relational databases

A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

Foregraph: Exploring Large-Scale Graph Processing On Multi-Fpga Architecture