Quantized Feature Distillation for Network Quantization

Ke Zhu,Yin-Yin He,Jianxin Wu
2023-07-20
Abstract:Neural network quantization aims to accelerate and trim full-precision neural network models by using low bit approximations. Methods adopting the quantization aware training (QAT) paradigm have recently seen a rapid growth, but are often conceptually complicated. This paper proposes a novel and highly effective QAT method, quantized feature distillation (QFD). QFD first trains a quantized (or binarized) representation as the teacher, then quantize the network using knowledge distillation (KD). Quantitative results show that QFD is more flexible and effective (i.e., quantization friendly) than previous quantization methods. QFD surpasses existing methods by a noticeable margin on not only image classification but also object detection, albeit being much simpler. Furthermore, QFD quantizes ViT and Swin-Transformer on MS-COCO detection and segmentation, which verifies its potential in real world deployment. To the best of our knowledge, this is the first time that vision transformers have been quantized in object detection and image segmentation tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the accuracy of low - bit quantization models and make them more flexible and effective in different visual tasks through a new method - Quantized Feature Distillation (QFD) during the neural network quantization process. Specifically, the paper proposes a simple and effective Quantization Aware Training (QAT) knowledge distillation method, namely QFD, which aims to use the quantized features as teacher signals to guide the quantization process of the student network. This method can not only restore the accuracy of the quantized model, but also outperform existing quantization methods in multiple tasks such as image classification and object detection. The main contributions of the paper are: 1. Propose a novel QAT knowledge distillation method QFD, which is easy to implement. 2. In classification, detection and segmentation benchmark tests, QFD has a significant accuracy advantage over previous quantization - aware training methods. 3. For the first time, attempt to quantize the visual transformer structure (such as ViT) for common object detection and segmentation tasks, verifying its potential in practical applications.