Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks
Yu-Shan Tai,Cheng-Yang Chang,Chieh-Fang Teng,Yi-Ta Chen,An-Yeu Wu
DOI: https://doi.org/10.1109/tcad.2023.3248503
IF: 2.9
2023-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Recently, deep convolutional neural networks (CNNs) have achieved eye-catching results in various applications. However, intensive memory access of activations introduces considerable energy consumption, resulting in a great challenge for deploying CNNs on resource-constrained edge devices. Existing research utilizes dimension reduction and mixed-precision quantization separately to reduce computational complexity without paying attention to their interaction. Such naïve concatenation of different compression strategies ends up with sub-optimal performance. To develop a comprehensive compression framework, we propose an optimization system by jointly considering dimension reduction and mixed-precision quantization, which is enabled by independent group-wise learnable mixed-precision schemes. Group partitioning is guided by a well-designed automatic group partition mechanism that can distinguish compression priorities among channels, and it can deal with the trade-off between model accuracy and compressibility. Moreover, to preserve model accuracy under low bit-width quantization, we propose a dynamic bit-width searching technique to enable continuous bit-width reduction. Our experimental results show that the proposed system reaches 69.03%/70.73% with average 2.16/2.61 bits per value on Resnet18/MobileNetV2, while introducing only approximately 1% accuracy loss of the uncompressed full-precision models. Compared with individual activation compression schemes, the proposed joint optimization system reduces 55%/9% (-2.62/-0.27 bits) memory access of dimension reduction and 55%/63% (-2.60/-4.52 bits) memory access of mixed-precision quantization, respectively, on Resnet18/ MobileNetV2 with comparable or even higher accuracy.
engineering, electrical & electronic,computer science, interdisciplinary applications, hardware & architecture