Reprogramming Distillation for Medical Foundation Models

Yuhang Zhou,Siyuan Du,Haolin Li,Jiangchao Yao,Ya Zhang,Yanfeng Wang
2024-07-09
Abstract:Medical foundation models pre-trained on large-scale datasets have demonstrated powerful versatile capabilities for various tasks. However, due to the gap between pre-training tasks (or modalities) and downstream tasks (or modalities), the real-world computation and speed constraints, it might not be straightforward to apply medical foundation models in the downstream scenarios. Previous methods, such as parameter efficient fine-tuning (PEFT) methods and knowledge distillation (KD) methods, are unable to simultaneously address the task (or modality) inconsistency and achieve personalized lightweight deployment under diverse real-world demands. To address the above issues, we propose a novel framework called Reprogramming Distillation (RD). On one hand, RD reprograms the original feature space of the foundation model so that it is more relevant to downstream scenarios, aligning tasks and modalities. On the other hand, through a co-training mechanism and a shared classifier, connections are established between the reprogrammed knowledge and the knowledge of student models, ensuring that the reprogrammed feature space can be smoothly mimic by the student model of different structures. Further, to reduce the randomness under different training conditions, we design a Centered Kernel Alignment (CKA) distillation to promote robust knowledge transfer. Empirically, we show that on extensive datasets, RD consistently achieve superior performance compared with previous PEFT and KD methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper mainly discusses how to effectively apply pre-trained large-scale medical base models to downstream tasks. Current methods, such as Parameter Efficiency Fine-Tuning (PEFT) and Knowledge Distillation (KD), have limitations in addressing task or modality inconsistency and meeting the lightweight deployment requirements of different practical scenarios. Therefore, the paper proposes a new framework called Reprogramming Distillation (RD). RD solves the above problems through two core components: collaborative reprogramming and central nucleus alignment distillation. Collaborative reprogramming adjusts the feature space of the base model to make it more relevant to downstream tasks using a trainable reprogramming module. At the same time, through shared classifiers and collaborative training mechanisms, it ensures that the reprogrammed features can be smoothly imitated by student models with different structures. The central nucleus alignment distillation is used to reduce randomness under different training conditions and enhance the robustness of feature transfer. Experiments show that RD significantly outperforms previous PEFT and KD methods on multiple medical image datasets, especially in downstream tasks with small data volume. In addition, RD has advantages such as reducing GPU usage, improving parameter privacy, and enabling customized deployment structures, ensuring lightweight and flexibility while adapting the model to downstream tasks. In conclusion, this paper proposes a new framework to address the challenges of adapting medical base models to downstream tasks, improving model performance, and optimizing the efficiency of practical deployment.