Abstract:Traditional knowledge distillation relies on high‐capacity teacher models to supervise the training of compact student networks. To avoid the computational resource costs associated with pretraining high‐capacity teacher models, teacher‐free online knowledge distillation methods have achieved satisfactory performance. Among these methods, feature fusion methods have effectively alleviated the limitations of training without the strong guidance of a powerful teacher model. However, existing feature fusion methods often focus primarily on end‐layer features, overlooking the efficient utilization of holistic knowledge loops and high‐level information within the network. In this article, we propose a new feature fusion‐based mutual learning method called Diversify Feature Enhancement and Fusion for Online Knowledge Distillation (DFEF). First, we enhance advanced semantic information by mapping multiple end‐of‐network features to obtain richer feature representations. Next, we design a self‐distillation module to strengthen knowledge interactions between the deep and shallow network layers. Additionally, we employ attention mechanisms to provide deeper and more diversified enhancements to the input feature maps of the self‐distillation module, allowing the entire network architecture to acquire a broader range of knowledge. Finally, we employ feature fusion to merge the enhanced features and generate a high‐performance virtual teacher to guide the training of the student model. Extensive evaluations on the CIFAR‐10, CIFAR‐100, and CINIC‐10 datasets demonstrate that our proposed method can significantly enhance performance compared to state‐of‐the‐art feature fusion‐based online knowledge distillation methods. Our code can be found at https://github.com/JSJ515-Group/DFEF-Liu.

Implicit Feature Alignment for Knowledge Distillation.

Adaptive Informative Semantic Knowledge Transfer for Knowledge Distillation

Hybrid mix-up contrastive knowledge distillation

Distilling Knowledge by Mimicking Features

Strengthening Attention: Knowledge Distillation Via Cross-Layer Feature Fusion for Image Classification

Knowledge Distillation with Feature Maps for Image Classification

Interactive Knowledge Distillation for image classification

Multistage feature fusion knowledge distillation

Knowledge distillation based on multi-layer fusion features

Input-Dependent Dynamical Channel Association for Knowledge Distillation.

In Defense of Feature Mimicking for Knowledge Distillation.

An Embarrassingly Simple Approach for Knowledge Distillation

Attention-based Feature Interaction for Efficient Online Knowledge Distillation.

Feature Fusion-Based Collaborative Learning for Knowledge Distillation.

Revisiting Knowledge Distillation: an Inheritance and Exploration Framework

A Two-Teacher Framework For Knowledge Distillation

Enhancement of Knowledge Distillation via Non-Linear Feature Alignment

Frustratingly Easy Knowledge Distillation Via Attentive Similarity Matching.

DFEF: Diversify feature enhancement and fusion for online knowledge distillation

Adaptive Explicit Knowledge Transfer for Knowledge Distillation

Semi-Online Knowledge Distillation