How Does Diverse Interpretability of Textual Prompts Impact Medical Vision-Language Zero-Shot Tasks?

Sicheng Wang,Che Liu,Rossella Arcucci
2024-10-10
Abstract:Recent advancements in medical vision-language pre-training (MedVLP) have significantly enhanced zero-shot medical vision tasks such as image classification by leveraging large-scale medical image-text pair pre-training. However, the performance of these tasks can be heavily influenced by the variability in textual prompts describing the categories, necessitating robustness in MedVLP models to diverse prompt styles. Yet, this sensitivity remains underexplored. In this work, we are the first to systematically assess the sensitivity of three widely-used MedVLP methods to a variety of prompts across 15 different diseases. To achieve this, we designed six unique prompt styles to mirror real clinical scenarios, which were subsequently ranked by interpretability. Our findings indicate that all MedVLP models evaluated show unstable performance across different prompt styles, suggesting a lack of robustness. Additionally, the models' performance varied with increasing prompt interpretability, revealing difficulties in comprehending complex medical concepts. This study underscores the need for further development in MedVLP methodologies to enhance their robustness to diverse zero-shot prompts.
Computer Vision and Pattern Recognition,Computation and Language,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to systematically evaluate the sensitivity of three mainstream Medical Vision - Language Pretraining (MedVLP) models to different text prompts in zero - shot classification tasks. Specifically, the authors focus on the following issues: 1. **Sensitivity of models to different text prompt styles**: - Current MedVLP models have unstable performance when using different styles of text prompts. Ideally, MedVLP models should be able to provide consistent results for various disease categories, regardless of the prompt style (e.g., simplified disease names or detailed descriptions). However, existing research has not fully explored this sensitivity. 2. **Ability to understand complex medical concepts**: - Existing MedVLP models have difficulties in dealing with complex medical concepts. When the interpretability of the prompts increases, the performance of the models is affected, indicating that they face challenges in understanding complex medical terms and descriptions. 3. **Zero - shot reasoning ability**: - For unseen disease categories, MedVLP models should be able to learn from detailed, highly interpretable text prompts and improve prediction accuracy. However, the capabilities of existing models in this regard are not clear. To evaluate these issues, the authors designed six different styles of text prompts and conducted experiments on three publicly available benchmark datasets. These prompt styles include: disease names, symptom descriptions, attribute descriptions, general English descriptions, radiologist - style descriptions, and medical - style descriptions. Through these experiments, the authors hope to reveal the limitations of current MedVLP models and provide improvement suggestions for future research. ### Main findings 1. **Performance fluctuations**: - All evaluated MedVLP models show significant fluctuations in performance under different prompt styles, indicating that they lack robustness to diverse prompt styles. 2. **Understanding of complex medical concepts**: - The models show difficulties in dealing with complex medical concepts, especially when the interpretability of the prompts increases, the performance drops significantly. 3. **Zero - shot reasoning ability**: - Only some models (such as MedKLIP) show the ability to utilize highly interpretable prompts for unseen disease categories, while other models (such as BioViL and KAD) have relatively limited performance in this regard. ### Conclusions and suggestions Based on the above findings, the authors propose suggestions for improving MedVLP models: - **Incorporate domain - knowledge - enhancement methods**: Use external knowledge bases, such as UMLS, to incorporate medical - domain knowledge into the models to improve zero - shot diagnosis performance. - **Use information - rich texts for pre - training**: The pre - training stage should include more descriptive and highly interpretable text prompts so that the models can better utilize this information during reasoning. - **Ensure the diversity of text styles in the pre - training dataset**: The pre - training dataset should cover various text styles from simple disease names to detailed descriptions to enhance the adaptability and robustness of the models. These improvement measures are expected to improve the performance and stability of MedVLP models when dealing with diverse zero - shot prompts.