Abstract:Knowledge distillation (KD) is a technique that transfers “dark knowledge” from a deep teacher network (teacher) to a shallow student network (student). Despite significant advances in KD, existing work has not adequately mined two crucial types of knowledge: 1) the knowledge of head categories, which represents the relationship between the target category and its similar categories. Our findings reveal that this highly similar (complex) knowledge is essential for improving student’s performance; and 2) the effectively utilized knowledge of tail categories. Existing studies often treat the non-target categories collectively without sufficiently considering the effectiveness of knowledge from tail categories. To tackle these challenges, we reformulate classical KD (ReKD) into two components: Top- K Inter-class Similar Distillation (TISD) and Non-Top- K Inter-class Discriminability (NTID). Firstly, TISD captures and imparts the knowledge of head categories to the student. Our experimental results have verified that TISD is particularly effective in transferring the knowledge of head categories, even in fine-grained dataset classification. Secondly, we theoretically show that the weighting coefficient of NTID increases with the probability of Top- K , leading to stronger suppression of knowledge transfer for tail categories. This observation explains why difficult samples are more informative than simple ones. To better utilize both types of knowledge, we optimize both TISD and NTID using different weighting coefficients, thereby enhancing the student’s ability to learn this valuable knowledge from both head and tail categories. Furthermore, our extensive experimental results demonstrate that ReKD achieves state-of-the-art performance on various image classification datasets, including CIFAR-100, Tiny-ImageNet, and ImageNet-1K, as well as object detection and instance segmentation using the MS-COCO dataset.

Knowledge Distillation on Multiple Experts for Long-Tailed Recognition

Learning from Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification

One‐stage self‐distillation guided knowledge transfer for long‐tailed visual recognition

Balanced Self-Distillation for Long-Tailed Recognition

Dynamic collaborative learning with heterogeneous knowledge transfer for long-tailed visual recognition

Relieving the Incompatibility of Network Representation and Classification for Long-Tailed Data Distribution

Adversarial-Based Knowledge Distillation for Multi-Model Ensemble and Noisy Data Refinement

Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation

Multi-Label Knowledge Distillation.

Towards Effective Collaborative Learning in Long-Tailed Recognition

MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition

Sample-Aware Knowledge Distillation for Long-Tailed Learning

Balanced Knowledge Distillation for Long-tailed Learning

Learn from Balance: Rectifying Knowledge Transfer for Long-Tailed Scenarios

Knowledge Distillation from Single to Multi Labels: an Empirical Study

Improving Knowledge Distillation Via Head and Tail Categories

Teacher-student collaborative knowledge distillation for image classification

Feature Distribution Representation Learning Based on Knowledge Transfer for Long-Tailed Classification

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

Multiple Teachers-Meticulous Student: A Domain Adaptive Meta-Knowledge Distillation Model for Medical Image Classification