Abstract:Traditional knowledge distillation relies on high‐capacity teacher models to supervise the training of compact student networks. To avoid the computational resource costs associated with pretraining high‐capacity teacher models, teacher‐free online knowledge distillation methods have achieved satisfactory performance. Among these methods, feature fusion methods have effectively alleviated the limitations of training without the strong guidance of a powerful teacher model. However, existing feature fusion methods often focus primarily on end‐layer features, overlooking the efficient utilization of holistic knowledge loops and high‐level information within the network. In this article, we propose a new feature fusion‐based mutual learning method called Diversify Feature Enhancement and Fusion for Online Knowledge Distillation (DFEF). First, we enhance advanced semantic information by mapping multiple end‐of‐network features to obtain richer feature representations. Next, we design a self‐distillation module to strengthen knowledge interactions between the deep and shallow network layers. Additionally, we employ attention mechanisms to provide deeper and more diversified enhancements to the input feature maps of the self‐distillation module, allowing the entire network architecture to acquire a broader range of knowledge. Finally, we employ feature fusion to merge the enhanced features and generate a high‐performance virtual teacher to guide the training of the student model. Extensive evaluations on the CIFAR‐10, CIFAR‐100, and CINIC‐10 datasets demonstrate that our proposed method can significantly enhance performance compared to state‐of‐the‐art feature fusion‐based online knowledge distillation methods. Our code can be found at https://github.com/JSJ515-Group/DFEF-Liu.

Sparse Friendly Distillation Using Feature Decoupling

Using Less but Important Information for Feature Distillation

DCCD: Reducing Neural Network Redundancy Via Distillation

Self-boosting for Feature Distillation

Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph

Distilling a Powerful Student Model via Online Knowledge Distillation

Student-friendly Knowledge Distillation

Tree-like Decision Distillation

DFD: Distillng the Feature Disparity Differently for Detectors

Multistage feature fusion knowledge distillation

Semantic-aware Knowledge Distillation with Parameter-Free Feature Uniformization

DFEF: Diversify feature enhancement and fusion for online knowledge distillation

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion

Knowledge Distillation with Deep Supervision

Knowledge Distillation with a Precise Teacher and Prediction with Abstention

Restructuring the Teacher and Student in Self-Distillation

Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms

SAKD: Sparse attention knowledge distillation

Reinforced Multi-Teacher Selection for Knowledge Distillation

Adversarial Distillation for Learning with Privileged Provisions