Bandwidth-efficient Inference for Neural Image Compression

Shanzhi Yin,Tongda Xu,Yongsheng Liang,Yuanyuan Wang,Yanghao Li,Yan Wang,Jingjing Liu
2023-09-07
Abstract:With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method. Specifically, we propose a transform-quantization-entropy coding pipeline for activation compression with symmetric exponential Golomb coding and a data-dependent Gaussian entropy model for arithmetic coding. Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19x bandwidth reduction with 6.21x energy saving.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The paper attempts to address the bottleneck problem caused by limited external memory (or DRAM) communication bandwidth and power consumption constraints when performing neural network inference on mobile and edge devices. Specifically, the paper proposes an end-to-end differentiable bandwidth-efficient neural inference method by compressing activation feature maps through neural data compression techniques. This method achieves up to 19 times bandwidth reduction and saves 6.21 times energy consumption in image compression tasks. The main contribution of the paper is the proposal of a transform-quantization-entropy coding pipeline for activation compression, and the design of symmetric exponential Golomb coding and data-dependent Gaussian entropy models to calculate the actual byte size of activation feature maps, thereby serving as part of the optimization objective function. Additionally, the paper is the first to apply bandwidth-efficient neural inference to low-level tasks such as image compression and validate its effectiveness.