Memory-Efficient Compression Based on Least-Squares Fitting in Convolutional Neural Network Accelerators.

Hang Xu,Chenjia Xie,Xin Lu,Li Du,Yuan Du
DOI: https://doi.org/10.1109/ASICON58565.2023.10396197
2023-01-01
Abstract:Convolutional Neural Networks (CNNs) generate massive interlayer feature data during network inference. To improve the throughput and energy efficiency in embedded systems, the feature map compression has been discussed for reducing the data movement. In this paper, we present a hardware compression scheme leveraging least-squares fitting (LSF), which substantially reduces the amount of interlayer data generated in the CNN inference process. The mean-square error (MSE) threshold for LSF fitting is optimized through a derivative-free algorithm. This work can be incorporated with Huffman coding method to further compress the interlayer data. A prototype accelerator equipped with compression scheme is implemented in TSMC 28-nm CMOS technology. We design an approximate calculation circuit to optimize the area consumption of the heavily used divider in compression method, which facilitates a remarkable reduction in area, which is less than 10% of systolic array. The compression scheme achieves 1.43x~1.70x interlayer feature map reduction by adding light hardware area overhead.
What problem does this paper attempt to address?