Cross Range Quantization for Network Compression.

Yicai Yang,Xiaofen Xing,Ming Chen,Kailing Guo,Xiangmin Xu,Fang Liu
DOI: https://doi.org/10.1109/IJCNN54540.2023.10191486
2023-01-01
Abstract:Quantization is effective in reducing model memory and accelerating inference, and is an important way to deploy deep neural networks on mobile smart devices. However, current popular learnable quantization functions often take simple truncation operations for values beyond the quantization range. We find that the truncation operation cause information loss and and restricts the update of values out of quantization range. To address this problem, we propose a universal cross range quantization (CRQ) method to reduce the information loss caused by the conventional truncation operation. CRQ splits the values exceeding the quantization range into two parts for separate quantization, and thus retain the information efficiently for performance improvement. In addition, we define a new metric named performance improvement efficiency (PIE) to measure the relationship between increased computation and performance improvement. Experiments on public benchmark image classification datasets show that CRQ achieves a significant accuracy gain with only a small increase in computation compared to the original learnable quantization method, and also outperforms many sophisticated designed state-of-the-art quantization methods in terms of accuracy and PIE.
What problem does this paper attempt to address?