DFGPD: a new distillation framework with global and positional distillation

Weixing Su,Haoyu Wang,Fang Liu,Linfeng Li
DOI: https://doi.org/10.1007/s00530-024-01503-9
IF: 3.9
2024-09-21
Multimedia Systems
Abstract:Knowledge distillation is a commonly used method for model compression that has been widely utilized in various computer vision tasks. Many efforts have utilized attention mechanisms to guide the student networks during training, encouraging them to mimic the important features of the teacher. However, most of these efforts use either the channel attention map or the spatial attention map to guide the student, ignoring the importance of positional features. In this paper, we propose a new distillation framework transferring global and positional features (DFGPD), which consists of three parts: global and positional distillation, a generic teacher framework and a two-stage distillation method. DFGPD takes positional information into consideration for a more effective distillation process. We conduct extensive comparison experiments, ablation studies, and sensitivity studies to demonstrate the effectiveness and stability of DFGPD. Our results show that (1) DFGPD achieves comparable or even better performance compared to state-of-the-art methods; (2) DFGPD can alleviate the bigger-models-not-always-better-teachers issue to a certain extent.
computer science, information systems, theory & methods
What problem does this paper attempt to address?