A hardware-friendly logarithmic quantization method for CNNs and FPGA implementation

Jiang, Tao,Yu, Jinming
DOI: https://doi.org/10.1007/s11554-024-01484-y
IF: 2.293
2024-06-07
Journal of Real-Time Image Processing
Abstract:Convolutional Neural Networks (CNNs) have been widely used in various fields due to their high accuracy and efficiency. The performance of CNNs is mainly affected by the computing capability, memory bandwidth, and flexibility of embedded devices. The high energy efficiency, computing capability, and reconfigurability of FPGAs make it a good platform for hardware acceleration in the design of CNNs. However, the increase of complexity of CNNs, requires memory while the FPGA on-chip storage is limited. Therefore, we use an improved logarithmic quantization to compress the model. This approach allows for significant reduction in bit widths while maintaining high accuracy levels, making it an effective compression method. In this work, a hardware-friendly quantization scheme is proposed, in which the weights use improved logarithmic quantization scheme, and the quantization scheme of activations use the fixed-point-to-logarithmic. The results show that the quantization model has negligible Top-1/5 accuracy loss without any retraining. In addition, we implement an acceleration engine for a heterogeneous Generalized Matrix Multiplication (GEMM) core on Zynq XC7Z020. In GEMM, the multiplier is replaced by logic shifters and adders, which achieves efficient utilization of LUT resources. We use the optimal quantization model on Zynq XC7Z020. The throughput reaches 69.7 GOPs with a power consumption of 6.008W, and the resource efficiency is 8.713 GOPs/DSP or 5.564 GOPs/kLUTs.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?