Abstract:Vision Transformers (ViTs) have achieved significant advancement in computer vision tasks due to their powerful modeling capacity. However, their performance notably degrades when trained with insufficient data due to lack of inherent inductive biases. Distilling knowledge and inductive biases from a Convolutional Neural Network (CNN) teacher has emerged as an effective strategy for enhancing the generalization of ViTs on limited datasets. Previous approaches to Knowledge Distillation (KD) have pursued two primary paths: some focused solely on distilling the logit distribution from CNN teacher to ViT student, neglecting the rich semantic information present in intermediate features due to the structural differences between them. Others integrated feature distillation along with logit distillation, yet this introduced alignment operations that limits the amount of knowledge transferred due to mismatched architectures and increased the computational overhead. To this end, this paper presents Hybrid Data-efficient Knowledge Distillation (HDKD) paradigm which employs a CNN teacher and a hybrid student. The choice of hybrid student serves two main aspects. First, it leverages the strengths of both convolutions and transformers while sharing the convolutional structure with the teacher model. Second, this shared structure enables the direct application of feature distillation without any information loss or additional computational overhead. Additionally, we propose an efficient light-weight convolutional block named Mobile Channel-Spatial Attention (MBCSA), which serves as the primary convolutional block in both teacher and student models. Extensive experiments on two medical public datasets showcase the superiority of HDKD over other state-of-the-art models and its computational efficiency. Source code at: <a class="link-external link-https" href="https://github.com/omarsherif200/HDKD" rel="external noopener nofollow">this https URL</a>

MED-TEX: Transferring and Explaining Knowledge with Less Data from Pretrained Medical Imaging Models

MSKD: Structured knowledge distillation for efficient medical image segmentation

Knowledge Distillation for Adaptive MRI Prostate Segmentation Based on Limit-Trained Multi-Teacher Models

Efficient knowledge distillation for liver CT segmentation using growing assistant network

Learning Interpretation with Explainable Knowledge Distillation

Multi-Task Multi-Scale Contrastive Knowledge Distillation for Efficient Medical Image Segmentation

Multiple Teachers-Meticulous Student: A Domain Adaptive Meta-Knowledge Distillation Model for Medical Image Classification

Evaluating Knowledge Transfer in Neural Network for Medical Images

Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

ClearKD: Clear Knowledge Distillation for Medical Image Classification

Distilling Knowledge from Deep Networks with Applications to Healthcare Domain

RSKD: Enhanced medical image segmentation via multi-layer, rank-sensitive knowledge distillation in Vision Transformer models

HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification

More From Less: Self-Supervised Knowledge Distillation for Routine Histopathology Data

Symbolic Knowledge Extraction and Distillation into Convolutional Neural Networks to Improve Medical Image Classification

Efficient Medical Image Segmentation Based on Knowledge Distillation

Interpretable and synergistic deep learning for visual explanation and statistical estimations of segmentation of disease features from medical images

A Medical Image Segmentation Method Combining Knowledge Distillation and Contrastive Learning

Attention to detail: inter-resolution knowledge distillation

Improving Knowledge Distillation with Teacher's Explanation

From explanation to intervention: Interactive knowledge extraction from Convolutional Neural Networks used in radiology