CSMPQ: Class Separability Based Mixed-Precision Quantization.

Mingkai Wang,Taisong Jin,Miaohui Zhang,Zhengtao Yu
DOI: https://doi.org/10.1007/978-981-99-4755-3_47
2023-01-01
Abstract:Network quantization has become increasingly popular due to its ability to reduce storage requirements and accelerate inference time. However, However, ultra low-bit quantization is still challenging due to significant performance degradation. Mixed-precision quantization has been introduced as a solution to achieve speedup while maintaining accuracy as much as possible by quantizing different bits for different layers. However, existing methods either focus on the sensitivity of different network layers, neglecting the intrinsic attribute of activations, or require a reinforcement learning and neural architecture search process to obtain the optimal bit-width configuration, which is time-consuming. To address these limitations, we propose a new mixed-precision quantization method based on the class separability of layer-wise feature maps. Specifically, we extend the widely-used term frequency-inverse document frequency (TF-IDF) to measure the class separability of layer-wise feature maps. We identify that the layers with lower class separability can be quantized to lower bits. Furthermore, we design a linear programming problem to derive the optimal bit configuration. Without any iterative process, our proposed method, CSMPQ, achieves better compression trade-offs than state-of-the-art quantization algorithms. Specifically, for Quantization-Aware Training, we achieve Top-1 accuracy of 73.03% on ResNet-18 with only 63GBOPs, and Top-1 accuracy of 71.30% with 1.5 Mb on MobileNetV2 for Post-Training Quantization.
What problem does this paper attempt to address?