Abstract:The development of FPGA-based applications using HLS is fraught with performance pitfalls and large design space exploration times. These issues are exacerbated when the application is complicated and its performance is dependent on the input data set, as is often the case with graph neural network approaches to machine learning. Here, we introduce HLPerf, an open-source, simulation-based performance evaluation framework for dataflow architectures that both supports early exploration of the design space and shortens the performance evaluation cycle. We apply the methodology to GNNHLS, an HLS-based graph neural network benchmark containing 6 commonly used graph neural network models and 4 datasets with distinct topologies and scales. The results show that HLPerf achieves over 10 000 × average simulation acceleration relative to RTL simulation and over 400 × acceleration relative to state-of-the-art cycle-accurate tools at the cost of 7% mean error rate relative to actual FPGA implementation performance. This acceleration positions HLPerf as a viable component in the design cycle.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: when using High - Level Synthesis (HLS) tools to develop FPGA - based applications, especially when dealing with complex applications such as Graph Neural Networks (GNNs), the challenges in the performance evaluation and optimization process. Specifically, the paper focuses on the following aspects: 1. **Complexity of performance evaluation**: Traditional performance evaluation methods, such as RTL simulation, are time - consuming and difficult to understand. Especially when dealing with large - scale graph datasets, these methods become impractical. In addition, due to the irregularity of graph datasets and the dynamic characteristics of algorithms, traditional static performance estimation methods cannot provide accurate performance predictions either. 2. **Efficiency of design space exploration**: When optimizing HLS code, a large amount of design space needs to be explored, including different optimization strategies, coding paradigms, etc. However, each design iteration requires long - term RTL simulation or FPGA execution, which greatly reduces the efficiency of design space exploration. 3. **Optimization of dataflow architectures**: GNNs usually adopt dataflow architectures. This architecture connects multiple functions through FIFOs to achieve task - level parallelism. However, the design space of dataflow architectures is very wide, including task partitioning, FIFO depth adjustment, and bottleneck identification, all of which increase the difficulty of optimization. To solve the above problems, the paper proposes HLPerf, an open - source, approximately cycle - accurate performance evaluation framework. The main contributions of HLPerf include: - **Fast performance evaluation**: HLPerf provides approximately cycle - accurate performance evaluation through simulation methods. Its speed is two orders of magnitude faster than existing cycle - accurate simulation tools, and the error rate is only 7%. - **Automatic code conversion**: HLPerf can automatically convert HLS C code into simulation components and supports multiple GNN operations. - **Advanced quantitative expressions**: HLPerf proposes a set of advanced quantitative expressions for modeling the impact of various optimization techniques on performance, thereby guiding the dataflow pipeline design before functional verification. - **Comprehensive evaluation**: The paper conducts a comprehensive evaluation of HLPerf, using 6 different GNN models, 4 graph datasets, and some general - purpose applications to verify the accuracy of its performance prediction and the performance of the simulator itself. In short, HLPerf aims to accelerate the exploration of HLS design space and improve the development efficiency of FPGA - based GNN applications through fast and accurate performance evaluation.

HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow Architectures

GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis

Graph Neural Networks for High-Level Synthesis Design Space Exploration

PowerGear: Early-Stage Power Estimation in FPGA HLS via Heterogeneous Edge-Centric GNNs

A Survey on Performance Optimization of High-Level Synthesis Tools

HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis

AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs

Pflow: An end-to-end heterogeneous acceleration framework for CNN inference on FPGAs

High-level synthesis: productivity, performance, and software constraints

H2PIPE: High throughput CNN Inference on FPGAs with High-Bandwidth Memory

HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis

CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis

HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation

HitGNN: High-throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

FP-DNN: an Automated Framework for Mapping Deep Neural Networks Onto FPGAs with RTL-HLS Hybrid Templates

HPIPE: Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs

DGNN-Booster: A Generic FPGA Accelerator Framework For Dynamic Graph Neural Network Inference

HAO: Hardware-aware neural Architecture Optimization for Efficient Inference

A high throughput acceleration for hybrid neural networks with efficient resource management on FPGA

fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs