Abstract:Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: Prompt tuning is a distribution calibrator. And we support it by analyzing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at <a class="link-external link-http" href="http://github.com/zuwenqiang/EPT" rel="external noopener nofollow">this http URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem that the effectiveness of parameter - efficient fine - tuning (PEFT) methods in medical image classification tasks has not been fully explored, especially in cross - domain few - shot scenarios. Specifically, the paper focuses on the following aspects: 1. **Adaptability of base models in medical image analysis**: - After pre - training on large - scale natural image data, base models (such as Vision Transformer) show a large domain gap in cross - domain medical image classification tasks, resulting in a decline in their performance. - When directly applying these base models to medical image classification tasks, due to the limited amount of medical image data and high annotation costs, the traditional training - from - scratch method has huge computational and memory overheads. 2. **Limitations of existing PEFT methods**: - Existing parameter - efficient fine - tuning methods (such as VPT, VP, etc.) have deficiencies in the way of introducing prompts, being unable to optimize input tokens in a fine - grained manner and even significantly destroying the original information. - These methods have limited approximation capabilities on the Transformer architecture, especially performing poorly in cross - domain few - shot scenarios in medical image analysis. 3. **Abnormal feature space distribution**: - It has been observed that during the pre - training process of base models, there will be an abnormal phenomenon of feature space distribution, that is, samples of the same category have a large distance in the feature space, which affects the classification effect. - Prompt tuning can alleviate this negative impact, but its mechanism has not been fully understood. To solve the above problems, the paper proposes the **Embedded Prompt Tuning (EPT)** method. By embedding prompt tokens into the extended channels, it not only retains the original information but also introduces additional useful context to optimize the input tokens. In addition, the paper also proposes a new perspective to understand prompt tuning: prompt tuning is essentially a distribution calibrator, and theoretical support is provided by analyzing the patch - wise scaling and feature separation operations in EPT. ### Summary The main objective of the paper is to improve the adaptability and performance of base models in medical image classification tasks, especially in cross - domain few - shot scenarios. It improves existing PEFT methods by proposing the EPT method and deeply explores the essence of prompt tuning and its impact on feature distribution.

Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity

Fine-grained Prompt Tuning: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification

Less Could Be Better: Parameter-efficient Fine-tuning Advances Medical Vision Foundation Models

Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification

Pre-training Everywhere: Parameter-Efficient Fine-Tuning for Medical Image Analysis via Target Parameter Pre-training

Prompt tuning for parameter-efficient medical image segmentation

Med-Tuning: A New Parameter-Efficient Tuning Framework for Medical Volumetric Segmentation

Positional Prompt Tuning for Efficient 3D Representation Learning

Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models

FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification

Visual Fourier Prompt Tuning

E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models

PathoTune: Adapting Visual Foundation Model to Pathological Specialists

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

VPPT: Visual Pre-Trained Prompt Tuning Framework for Few-Shot Image Classification

Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis

FPT: Improving Prompt Tuning Efficiency Via Progressive Training.

Prompt Your Brain: Scaffold Prompt Tuning for Efficient Adaptation of fMRI Pre-trained Model

Parameter Efficient Point Cloud Prompt Tuning for Unified Point Cloud Understanding