Improving the Post-Training Neural Network Quantization by Prepositive Feature Quantization

Tianshu Chu,Zuopeng Yang,Xiaolin Huang
DOI: https://doi.org/10.1109/tcsvt.2023.3311923
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Post-training neural network quantization (PTQ) is an effective model compression technology that has revolutionized the deployment of deep neural networks on various edge devices. It provides easy-to-use characteristics and allows for generating a quantized model based on a pre-trained counterpart without re-training. Typical PTQ approaches maintain output consistency through layer-wise calibration. However, these approaches still suffer from performance degradation primarily caused by feature quantization in ultra-low bitwidth conditions. To address this issue, we propose a prepositive feature quantization framework that decouples adjacent layers and calibrates the interaction between feature and parameter quantization perturbations. Additionally, we present a feature-loss-aware optimization strategy to solve the corresponding calibration problem. To validate the effectiveness of our method, we conducted extensive experiments on the ImageNet benchmark dataset. Our approach demonstrates a noticeable improvement in PTQ performance under the 2-bit condition.
What problem does this paper attempt to address?