Explore Efficient LUT-based Architecture for Quantized Convolutional Neural Networks on FPGA

Yanpeng Cao,Chengcheng Wang,Yongming Tang
DOI: https://doi.org/10.1109/fccm48280.2020.00065
2020-05-01
Abstract:The vast computations of the convolutional neural network have limited the speed of the forward inference running in hardware. In recent years, network quantization technique has made it possible to quantize network into low bit-wide and retain the original performance simultaneously, while the complexity of the quantized network is still considerable. FPGA is a highly parallelized platform, which contains a mass of configurable logic resources. We study on the feasibility of implementing convolution calculation based on pure LUTs, introduce the shift multipliers and addition trees, and propose an efficient architecture for QNN on FPGA. With the optimization of Winograd algorithm for QNN, we demonstrate that our scheme significantly reduces the number of multipliers and saves the usage of LUT resources by $2.25 \times $ at least without using DSP resources. As a result, our LUT-based architecture for QNN shortens the latency up to $19.3 \times $ and represents more effective performance compared to other methods.
What problem does this paper attempt to address?