Distribution Matched Low-bit Post-Training Quantization for Convolutional Neural Networks

Jiaying Zhang,Kefeng Huang,Chongyang Zhang
DOI: https://doi.org/10.1145/3579109.3579113
2022-01-01
Abstract:Post-training quantization of deep convolutional neural networks is highly desirable since it does not require retraining or access to the full training dataset. Unlike many recent uniform quantization methods that quantize weights and activations by discretizing the domain of values to evenly-spaced low-precision integers, we propose one Distribution Matched Low-bit Quantization (DMLQ) scheme which uses different quantization strategies for weights and activations. For weight quantization, we firstly select one pair of breakpoints to divide the distribution regions, and then one uniform-for-dense and non-uniform-for-sparse quantization strategy is applied, aiming to achieve more effective quantization levels assignment. For activation quantization, one layer-wise log quantization strategy with maximum-value based offset is adopted, which mitigate the accuracy degradation induced by unreasonable assignment of quantization levels to the original activations equally and directly. Evaluation on large-scale image classification benchmark shows that the proposed scheme can achieve state-of-the-art performance, especially for 4-bit quantization.
What problem does this paper attempt to address?