High Throughput CNN Accelerator Design Based on FPGA

Liang Xie,Xitian Fan,Wei Cao,Lingli Wang
DOI: https://doi.org/10.1109/fpt.2018.00052
2018-01-01
Abstract:Due to the fact that FPGA on-chip memory capacity increases significantly, the feature maps and weights of convolutional layers can be stored on chip, which can reduce the data movement between on-chip memory and off-chip memory. Hence, the bottleneck can shift from the bandwidth to the computing resources in convolutional layers, which will improve the performance dramatically. Under this circumstance, this paper quantitatively analyzes how to design the hardware architecture based on the roofline model to optimize the performance under the constraints of available on-chip computing resources and propose an efficient architecture. Our accelerator is implemented on Xilinx UltraScale+ FPGA with the performance of 9.39 TOPS and 6.86 TOPS for 8-bit data width with 100MHz main frequency and 400MHz DSP frequency on ResNet-50 and AlexNet, which outperforms the existing FPGA-based CNN accelerator.
What problem does this paper attempt to address?