FPGA Implementation of Quantized Convolutional Neural Networks.

Qi Zhang,Jian Cao,Ying Zhang,Shiguang Zhang,Quan Zhang,Dunshan Yu
DOI: https://doi.org/10.1109/icct46805.2019.8947168
2019-01-01
Abstract:Convolutional neural networks (CNN) are the most commonly used techniques in computer vision tasks. Image processing methods based on CNN are widely used, especially in areas such as face recognition, target detection, and speech recognition. A large number of CNN computing require the use of dedicated hardware, such as graphics processing units. Since this hardware is not portable in real life, there is an urgent need to apply neural networks to FPGAs. High-level synthesis (HLS) provides a good programming environment for developers, making the programming of FPGA more efficient. In this paper, we will introduce a quantized convolution neural network (QCNN) accelerator architecture based on HLS, which utilizes the parameters that are quantized during training and general processing elements during inference to improve performance. QCNN has fewer parameter operations, so it is advantageous to on-chip storage. The QCNN Accelerator uses a fast algorithm to implement batch normalization, which can greatly reduce hardware consumption while maintaining accuracy. We implemented the proposed architecture on the Nexys Video FPGA platform. The clock frequency is 100 MHz and the peak performance of QCNN reaches 22 GOP/S. Finally, the design and implementation of QCNN accelerator system based on mobile video are introduced. We connect the FPGA with the OV5640 camera to solve the problem of image classification in real-time video transmission.
What problem does this paper attempt to address?