Reprogramming Distillation for Medical Foundation Models

Yuhang Zhou,Siyuan Du,Haolin Li,Jiangchao Yao,Ya Zhang,Yanfeng Wang

2024-07-09

Abstract:Medical foundation models pre-trained on large-scale datasets have demonstrated powerful versatile capabilities for various tasks. However, due to the gap between pre-training tasks (or modalities) and downstream tasks (or modalities), the real-world computation and speed constraints, it might not be straightforward to apply medical foundation models in the downstream scenarios. Previous methods, such as parameter efficient fine-tuning (PEFT) methods and knowledge distillation (KD) methods, are unable to simultaneously address the task (or modality) inconsistency and achieve personalized lightweight deployment under diverse real-world demands. To address the above issues, we propose a novel framework called Reprogramming Distillation (RD). On one hand, RD reprograms the original feature space of the foundation model so that it is more relevant to downstream scenarios, aligning tasks and modalities. On the other hand, through a co-training mechanism and a shared classifier, connections are established between the reprogrammed knowledge and the knowledge of student models, ensuring that the reprogrammed feature space can be smoothly mimic by the student model of different structures. Further, to reduce the randomness under different training conditions, we design a Centered Kernel Alignment (CKA) distillation to promote robust knowledge transfer. Empirically, we show that on extensive datasets, RD consistently achieve superior performance compared with previous PEFT and KD methods.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper mainly discusses how to effectively apply pre-trained large-scale medical base models to downstream tasks. Current methods, such as Parameter Efficiency Fine-Tuning (PEFT) and Knowledge Distillation (KD), have limitations in addressing task or modality inconsistency and meeting the lightweight deployment requirements of different practical scenarios. Therefore, the paper proposes a new framework called Reprogramming Distillation (RD). RD solves the above problems through two core components: collaborative reprogramming and central nucleus alignment distillation. Collaborative reprogramming adjusts the feature space of the base model to make it more relevant to downstream tasks using a trainable reprogramming module. At the same time, through shared classifiers and collaborative training mechanisms, it ensures that the reprogrammed features can be smoothly imitated by student models with different structures. The central nucleus alignment distillation is used to reduce randomness under different training conditions and enhance the robustness of feature transfer. Experiments show that RD significantly outperforms previous PEFT and KD methods on multiple medical image datasets, especially in downstream tasks with small data volume. In addition, RD has advantages such as reducing GPU usage, improving parameter privacy, and enabling customized deployment structures, ensuring lightweight and flexibility while adapting the model to downstream tasks. In conclusion, this paper proposes a new framework to address the challenges of adapting medical base models to downstream tasks, improving model performance, and optimizing the efficiency of practical deployment.

Reprogramming Distillation for Medical Foundation Models

Towards Efficient Task-Driven Model Reprogramming with Foundation Models

QEKD: Query-Efficient and Data-Free Knowledge Distillation from Black-box Models.

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

Online Knowledge Distillation with Diverse Peers

Enhancement of Knowledge Distillation via Non-Linear Feature Alignment

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion

Restructuring the Teacher and Student in Self-Distillation

Training Task Experts through Retrieval Based Distillation

Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

DFGPD: a new distillation framework with global and positional distillation

Residual Error Based Knowledge Distillation

RDPD: Rich Data Helps Poor Data via Imitation

Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System

Multi-target Knowledge Distillation Via Student Self-reflection

Patient Knowledge Distillation for BERT Model Compression

Distilling a Powerful Student Model via Online Knowledge Distillation

DFD: Distillng the Feature Disparity Differently for Detectors

Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability