A Scalable FPGA Accelerator for Convolutional Neural Networks.

Ke Xu,Xiaoyun Wang,Shihang Fu,Dong Wang
DOI: https://doi.org/10.1007/978-981-13-2423-9_1
2018-01-01
Abstract:Convolution Neural Networks (CNN) have achieved undisputed success in many practical applications, such as image classification, face detection, and speech recognition. As we all know, FPGA-based CNN prediction is more efficient than GPU-based schemes, especially in terms of power consumption. In addition, OpenCL-based high-level synthesis tools in FPGA is widely utilized due to the fast verification and implementation flows. In this paper, we propose an FPGA accelerator with a scalable architecture of deeply pipelined OpenCL kernels. The design is verified by implementing three representative large-scale CNNs, AlexNet, VGG-16 and ResNet-50 on Altera OpenCL DE5-Net FPGA board. Our design has achieved a peak performance of 141 GOPS for convolution operation, and 103 GOPS for the entire VGG-16 network that performs ImageNet classification on DE5-Net board.
What problem does this paper attempt to address?