Precision-Mixed and Weight-Average Ensemble: Online Knowledge Distillation for Quantization Convolutional Neural Networks

Zijia Mo,Zhipeng Gao,Chen Zhao,Xinlei Yu,Kaile Xiao
DOI: https://doi.org/10.1109/WCNC55385.2023.10118790
2023-01-01
Abstract:Lightweight models with high accuracy is critical for edge intelligence. Although the Knowledge Distillation (KD) has been successfully applied to reduce the accuracy loss of quantized neural networks, especially for resource-constrained edge devices, the process of pre-training complex high-precision teacher networks in KD however, will bring huge training overhead. Recently proposed online distillation frameworks offer a good solution for teacher-free distillation, but the regularization effect and simple average aggregation of KD further weaken the representation capability of quantized models that have been reconstructed. In this work, we propose Precision-Mixed and Weight-Average Ensemble (PMWAE) consisting of multiple group members and a group leader. PMWAE provides additional knowledge by changing the bit-precision of the activation and generates aggregated weights for each member in group by attention-based mechanism. The ensemble knowledge is further passed to the group leader to obtain the final model. Extensive experiments on the CIFAR-10/100 and ImageNet-1K datasets show that our method outperforms the existing state-of-the-art methods, both on standard convolutions and depth-wise separable convolutions.
What problem does this paper attempt to address?