Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

Wenqiang Zu,Shenghao Xie,Qing Zhao,Guoqi Li,Lei Ma
DOI: https://doi.org/10.1016/j.media.2024.103258
2024-10-18
Abstract:Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: Prompt tuning is a distribution calibrator. And we support it by analyzing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at <a class="link-external link-http" href="http://github.com/zuwenqiang/EPT" rel="external noopener nofollow">this http URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem that the effectiveness of parameter - efficient fine - tuning (PEFT) methods in medical image classification tasks has not been fully explored, especially in cross - domain few - shot scenarios. Specifically, the paper focuses on the following aspects: 1. **Adaptability of base models in medical image analysis**: - After pre - training on large - scale natural image data, base models (such as Vision Transformer) show a large domain gap in cross - domain medical image classification tasks, resulting in a decline in their performance. - When directly applying these base models to medical image classification tasks, due to the limited amount of medical image data and high annotation costs, the traditional training - from - scratch method has huge computational and memory overheads. 2. **Limitations of existing PEFT methods**: - Existing parameter - efficient fine - tuning methods (such as VPT, VP, etc.) have deficiencies in the way of introducing prompts, being unable to optimize input tokens in a fine - grained manner and even significantly destroying the original information. - These methods have limited approximation capabilities on the Transformer architecture, especially performing poorly in cross - domain few - shot scenarios in medical image analysis. 3. **Abnormal feature space distribution**: - It has been observed that during the pre - training process of base models, there will be an abnormal phenomenon of feature space distribution, that is, samples of the same category have a large distance in the feature space, which affects the classification effect. - Prompt tuning can alleviate this negative impact, but its mechanism has not been fully understood. To solve the above problems, the paper proposes the **Embedded Prompt Tuning (EPT)** method. By embedding prompt tokens into the extended channels, it not only retains the original information but also introduces additional useful context to optimize the input tokens. In addition, the paper also proposes a new perspective to understand prompt tuning: prompt tuning is essentially a distribution calibrator, and theoretical support is provided by analyzing the patch - wise scaling and feature separation operations in EPT. ### Summary The main objective of the paper is to improve the adaptability and performance of base models in medical image classification tasks, especially in cross - domain few - shot scenarios. It improves existing PEFT methods by proposing the EPT method and deeply explores the essence of prompt tuning and its impact on feature distribution.