Dual-teacher Distillation Based on Interpretable Guidance for Lightening Mobile Model

Rui Mao,Zhichao Lian
DOI: https://doi.org/10.1109/icicml60161.2023.10424869
2023-01-01
Abstract:Recent years have witnessed the increasing convergence of algorithms, protocols, and applications for mobility, sensing, and networking. The lightweighting of models has become a top priority. At present, the most advanced distillation methods utilize the deep features of the middle layers of neural networks for distillation, but existing knowledge distillation methods have problems such as teachers’ inability to effectively transfer key knowledge and low quality of knowledge transfer. In recent years, some methods have introduced attention mechanisms to improve the quality of knowledge transfer, but there are some problems, such as teachers’ rigid teaching and limited application scenarios. In order to work out these problems, we propose a dual-teacher distillation algorithm based on interpretable guidance, which uses the interpretable algorithm to extract the important features of the teacher network for classification. Compared with vanilla distillation, the second teacher removes the useless features from the original teacher and guides students to learn key knowledge. In addition, with the help of correlation factors, we can more flexibly choose the weight of guidance provided by two teachers. Experiments on the tiny-imagenet dataset reveal that our algorithm has better classification performance in training small models compared to the SOTA methods.
What problem does this paper attempt to address?