HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow Architectures

Chenfeng Zhao,Clayton J. Faber,Roger D. Chamberlain,Xuan Zhang
DOI: https://doi.org/10.1145/3655627
IF: 2.837
2024-04-03
ACM Transactions on Reconfigurable Technology and Systems
Abstract:The development of FPGA-based applications using HLS is fraught with performance pitfalls and large design space exploration times. These issues are exacerbated when the application is complicated and its performance is dependent on the input data set, as is often the case with graph neural network approaches to machine learning. Here, we introduce HLPerf, an open-source, simulation-based performance evaluation framework for dataflow architectures that both supports early exploration of the design space and shortens the performance evaluation cycle. We apply the methodology to GNNHLS, an HLS-based graph neural network benchmark containing 6 commonly used graph neural network models and 4 datasets with distinct topologies and scales. The results show that HLPerf achieves over 10 000 × average simulation acceleration relative to RTL simulation and over 400 × acceleration relative to state-of-the-art cycle-accurate tools at the cost of 7% mean error rate relative to actual FPGA implementation performance. This acceleration positions HLPerf as a viable component in the design cycle.
computer science, hardware & architecture
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: when using High - Level Synthesis (HLS) tools to develop FPGA - based applications, especially when dealing with complex applications such as Graph Neural Networks (GNNs), the challenges in the performance evaluation and optimization process. Specifically, the paper focuses on the following aspects: 1. **Complexity of performance evaluation**: Traditional performance evaluation methods, such as RTL simulation, are time - consuming and difficult to understand. Especially when dealing with large - scale graph datasets, these methods become impractical. In addition, due to the irregularity of graph datasets and the dynamic characteristics of algorithms, traditional static performance estimation methods cannot provide accurate performance predictions either. 2. **Efficiency of design space exploration**: When optimizing HLS code, a large amount of design space needs to be explored, including different optimization strategies, coding paradigms, etc. However, each design iteration requires long - term RTL simulation or FPGA execution, which greatly reduces the efficiency of design space exploration. 3. **Optimization of dataflow architectures**: GNNs usually adopt dataflow architectures. This architecture connects multiple functions through FIFOs to achieve task - level parallelism. However, the design space of dataflow architectures is very wide, including task partitioning, FIFO depth adjustment, and bottleneck identification, all of which increase the difficulty of optimization. To solve the above problems, the paper proposes HLPerf, an open - source, approximately cycle - accurate performance evaluation framework. The main contributions of HLPerf include: - **Fast performance evaluation**: HLPerf provides approximately cycle - accurate performance evaluation through simulation methods. Its speed is two orders of magnitude faster than existing cycle - accurate simulation tools, and the error rate is only 7%. - **Automatic code conversion**: HLPerf can automatically convert HLS C code into simulation components and supports multiple GNN operations. - **Advanced quantitative expressions**: HLPerf proposes a set of advanced quantitative expressions for modeling the impact of various optimization techniques on performance, thereby guiding the dataflow pipeline design before functional verification. - **Comprehensive evaluation**: The paper conducts a comprehensive evaluation of HLPerf, using 6 different GNN models, 4 graph datasets, and some general - purpose applications to verify the accuracy of its performance prediction and the performance of the simulator itself. In short, HLPerf aims to accelerate the exploration of HLS design space and improve the development efficiency of FPGA - based GNN applications through fast and accurate performance evaluation.