Multistage feature fusion knowledge distillation

Gang Li,Kun Wang,Pengfei Lv,Pan He,Zheng Zhou,Chuanyun Xu
DOI: https://doi.org/10.1038/s41598-024-64041-4
IF: 4.6
2024-06-13
Scientific Reports
Abstract:Generally, the recognition performance of lightweight models is often lower than that of large models. Knowledge distillation, by teaching a student model using a teacher model, can further enhance the recognition accuracy of lightweight models. In this paper, we approach knowledge distillation from the perspective of intermediate feature-level knowledge distillation. We combine a cross-stage feature fusion symmetric framework, an attention mechanism to enhance the fused features, and a contrastive loss function for teacher and student models at the same stage to comprehensively implement a multistage feature fusion knowledge distillation method. This approach addresses the problem of significant differences in the intermediate feature distributions between teacher and student models, making it difficult to effectively learn implicit knowledge and thus improving the recognition accuracy of the student model. Compared to existing knowledge distillation methods, our method performs at a superior level. On the CIFAR100 dataset, it boosts the recognition accuracy of ResNet20 from 69.06% to 71.34%, and on the TinyImagenet dataset, it increases the recognition accuracy of ResNet18 from 66.54% to 68.03%, demonstrating the effectiveness and generalizability of our approach. Furthermore, there is room for further optimization of the overall distillation structure and feature extraction methods in this approach, which requires further research and exploration.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that during the knowledge distillation process, there are significant differences in the intermediate feature distributions between the lightweight model (student model) and the large - scale model (teacher model), which makes it difficult for the student model to effectively learn the implicit knowledge in the teacher model, thus affecting the recognition accuracy of the student model. By proposing a multi - stage feature - fusion knowledge distillation method, the paper aims to overcome this challenge and improve the recognition performance of the lightweight model. Specifically, the paper proposes the following innovations: 1. **Multi - stage Feature - fusion Framework (MSFF)**: A multi - stage feature - fusion framework is designed. This framework adopts a multi - level symmetric structure to ensure that the multi - level knowledge of the teacher model can be effectively transferred to the student model. 2. **Feature - fusion Attention Module (FFA)**: Based on the spatial and channel attention mechanisms, features at different stages are extracted and fused to achieve the concentration of feature knowledge. 3. **Spatial and Channel Mean - squared - error Loss Function (SCM)**: It is used to compare the feature differences between the teacher and student models at the same stage, comparing from three dimensions: original features, spatial features, and channel features, to ensure that the student model can learn the knowledge of the teacher model more effectively. Through these innovations, the paper verifies the effectiveness and generalization ability of the method on the CIFAR100 and TinyImagenet datasets, significantly improving the recognition accuracy of the lightweight model. For example, on the CIFAR100 dataset, the recognition accuracy of ResNet20 is increased from 69.06% to 71.34%, and on the TinyImagenet dataset, the recognition accuracy of ResNet18 is increased from 66.54% to 68.03%. In conclusion, through the multi - stage feature - fusion knowledge distillation method, this paper solves the problem that the student model has difficulty in effectively learning the implicit knowledge of the teacher model and improves the recognition performance of the lightweight model.