Abstract:We present FerKD, a novel efficient knowledge distillation framework that incorporates partial soft-hard label adaptation coupled with a region-calibration mechanism. Our approach stems from the observation and intuition that standard data augmentations, such as RandomResizedCrop, tend to transform inputs into diverse conditions: easy positives, hard positives, or hard negatives. In traditional distillation frameworks, these transformed samples are utilized equally through their predictive probabilities derived from pretrained teacher models. However, merely relying on prediction values from a pretrained teacher, a common practice in prior studies, neglects the reliability of these soft label predictions. To address this, we propose a new scheme that calibrates the less-confident regions to be the context using softened hard groundtruth labels. Our approach involves the processes of hard regions mining + calibration. We demonstrate empirically that this method can dramatically improve the convergence speed and final accuracy. Additionally, we find that a consistent mixing strategy can stabilize the distributions of soft supervision, taking advantage of the soft labels. As a result, we introduce a stabilized SelfMix augmentation that weakens the variation of the mixed images and corresponding soft labels through mixing similar regions within the same image. FerKD is an intuitive and well-designed learning system that eliminates several heuristics and hyperparameters in former FKD solution. More importantly, it achieves remarkable improvement on ImageNet-1K and downstream tasks. For instance, FerKD achieves 81.2% on ImageNet-1K with ResNet-50, outperforming FKD and FunMatch by remarkable margins. Leveraging better pre-trained weights and larger architectures, our finetuned ViT-G14 even achieves 89.9%. Our code is available at <a class="link-external link-https" href="https://github.com/szq0214/FKD/tree/main/FerKD" rel="external noopener nofollow">this https URL</a>.

FAKD: Feature Augmented Knowledge Distillation for Semantic Segmentation

FCKDNet: A Feature Condensation Knowledge Distillation Network for Semantic Segmentation

Data-Free Adversarial Distillation

Knowledge Augmentation for Distillation: A General and Effective Approach to Enhance Knowledge Distillation

Channel-wise Distillation for Semantic Segmentation.

Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient Semantic Segmentation

Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation

Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation

Normalized Feature Distillation for Semantic Segmentation

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

Bridging Knowledge Distillation Gap for Few-sample Unsupervised Semantic Segmentation

Small Scale Data-Free Knowledge Distillation

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Spirit Distillation: Precise Real-time Semantic Segmentation of Road Scenes with Insufficient Data

TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

DFEF: Diversify feature enhancement and fusion for online knowledge distillation

Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms

Data-free Knowledge Distillation for Fine-grained Visual Categorization

FerKD: Surgical Label Adaptation for Efficient Distillation

Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data

Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation