Accelerating RNNs on FPGA with HBM

WU Xifeng,Gong Jie,Fan Jun,He Hu
DOI: https://doi.org/10.19304/j.issn1000-7180.2021.0023
2021-01-01
Abstract:Aiming at the problem that the algorithm of the recurrent neural network is limited by bandwidth, an accelerated SoC based on HBM is designed, which can universally support the RNN and its variants. First, the structure of RNN and its variants, and the calculation requirements and storage requirements of the algorithms., a high-bandwidth accelerator design based on HBM was proposed and deployed on the Xilinx VCU128 development board. Finally, according to the Roofline model analysis method, the bandwidth and calculation density are imprved. The average inference performance of testing DeepSpeech2 and GNMT algorithms are 61.74 GFLOPs/sec and 20GFLOPs/sec respectively. Compared with the design based on DDR memory, the performance is improved by 3.68 times. Compared with the accelerated design of other floating-point 32-bit FPGA-based recurrent neural networks, the performance is improved by 8.5 times. This design proposes a data scheduling method for multi-channel memory and can adapt to different recurrent neural network applications.
What problem does this paper attempt to address?