Abstract:Generally, the recognition performance of lightweight models is often lower than that of large models. Knowledge distillation, by teaching a student model using a teacher model, can further enhance the recognition accuracy of lightweight models. In this paper, we approach knowledge distillation from the perspective of intermediate feature-level knowledge distillation. We combine a cross-stage feature fusion symmetric framework, an attention mechanism to enhance the fused features, and a contrastive loss function for teacher and student models at the same stage to comprehensively implement a multistage feature fusion knowledge distillation method. This approach addresses the problem of significant differences in the intermediate feature distributions between teacher and student models, making it difficult to effectively learn implicit knowledge and thus improving the recognition accuracy of the student model. Compared to existing knowledge distillation methods, our method performs at a superior level. On the CIFAR100 dataset, it boosts the recognition accuracy of ResNet20 from 69.06% to 71.34%, and on the TinyImagenet dataset, it increases the recognition accuracy of ResNet18 from 66.54% to 68.03%, demonstrating the effectiveness and generalizability of our approach. Furthermore, there is room for further optimization of the overall distillation structure and feature extraction methods in this approach, which requires further research and exploration.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that during the knowledge distillation process, there are significant differences in the intermediate feature distributions between the lightweight model (student model) and the large - scale model (teacher model), which makes it difficult for the student model to effectively learn the implicit knowledge in the teacher model, thus affecting the recognition accuracy of the student model. By proposing a multi - stage feature - fusion knowledge distillation method, the paper aims to overcome this challenge and improve the recognition performance of the lightweight model. Specifically, the paper proposes the following innovations: 1. **Multi - stage Feature - fusion Framework (MSFF)**: A multi - stage feature - fusion framework is designed. This framework adopts a multi - level symmetric structure to ensure that the multi - level knowledge of the teacher model can be effectively transferred to the student model. 2. **Feature - fusion Attention Module (FFA)**: Based on the spatial and channel attention mechanisms, features at different stages are extracted and fused to achieve the concentration of feature knowledge. 3. **Spatial and Channel Mean - squared - error Loss Function (SCM)**: It is used to compare the feature differences between the teacher and student models at the same stage, comparing from three dimensions: original features, spatial features, and channel features, to ensure that the student model can learn the knowledge of the teacher model more effectively. Through these innovations, the paper verifies the effectiveness and generalization ability of the method on the CIFAR100 and TinyImagenet datasets, significantly improving the recognition accuracy of the lightweight model. For example, on the CIFAR100 dataset, the recognition accuracy of ResNet20 is increased from 69.06% to 71.34%, and on the TinyImagenet dataset, the recognition accuracy of ResNet18 is increased from 66.54% to 68.03%. In conclusion, through the multi - stage feature - fusion knowledge distillation method, this paper solves the problem that the student model has difficulty in effectively learning the implicit knowledge of the teacher model and improves the recognition performance of the lightweight model.

Multistage feature fusion knowledge distillation

Knowledge distillation based on multi-layer fusion features

Knowledge Distillation Method for Surface Defect Detection.

Research on Knowledge Distillation Algorithm of Object Detection

Using Less but Important Information for Feature Distillation

Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms

DCCD: Reducing Neural Network Redundancy Via Distillation

Multi scale Feature Extraction and Fusion for Online Knowledge Distillation

DFEF: Diversify feature enhancement and fusion for online knowledge distillation

Multi-level knowledge distillation for low-resolution object detection and facial expression recognition

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Multi-target Knowledge Distillation Via Student Self-reflection

Progressive Network Grafting for Few-Shot Knowledge Distillation

Multiscale knowledge distillation with attention based fusion for robust human activity recognition

Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph

ResKD: Residual-Guided Knowledge Distillation

Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

Self-boosting for Feature Distillation

Improved Knowledge Distillation via Teacher Assistant

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion