Towards Low-Bit Quantization of Deep Neural Networks with Limited Data.

Yong Yuan,Chen,Xiyuan Hu,Silong Peng
DOI: https://doi.org/10.1109/icpr48806.2021.9412678
2021-01-01
Abstract:Recent machine learning methods use increasingly large deep neural networks to achieve state-of-the-art results in various tasks. Network quantization can effectively reduce computation and memory costs without modifying network structures, facilitating the deployment of deep neural networks (DNNs) on cloud and edge devices. However, most of the existing methods usually need time-consuming training or fine-tuning and access to the original training dataset that may be unavailable due to privacy or security concerns. In this paper, we present a novel method to achieve low-precision quantization with limited data. Firstly, to reduce the complexity of per-channel quantization and degeneration of per-layer quantization, we introduce group quantization that separates the output channels into groups and processes each group independently. Secondly, to better distill knowledge from the pre-trained FP32 model with limited data, we introduce a two-stage knowledge distillation method that divides the optimization process into blockwise optimization and joint optimization to address the limitation of layer-wise supervision and global supervision. Extensive experiments on ImageNet2012 (ResNet18/50, ShuffleNetV2, and MobileNetV2) demonstrate that the proposed approach can significantly improve the quantization model's accuracy when only a few training samples are available. We further show that the method also extends to other computer vision architectures and tasks such as object detection.
What problem does this paper attempt to address?