MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

Sunan He,Yuxiang Nie,Zhixuan Chen,Zhiyuan Cai,Hongmei Wang,Shu Yang,Hao Chen
2024-04-23
Abstract:The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrapping strategy that exploits both image and label information to construct vision-language datasets. Based on the constructed dataset, we developed MedDr, a generalist foundation model for healthcare capable of handling diverse medical data modalities, including radiology, pathology, dermatology, retinography, and endoscopy. Moreover, during inference, we propose a simple but effective retrieval-augmented medical diagnosis strategy, which enhances the model's generalization ability. Extensive experiments on visual question answering, medical report generation, and medical image diagnosis demonstrate the superiority of our method.
Computer Vision and Pattern Recognition,Computation and Language
What problem does this paper attempt to address?
The main problem this paper attempts to address is the issue of data scarcity encountered in the application of large medical vision-language models (LVLMs) in the medical field. Specifically, although large-scale vision-language models have demonstrated significant capabilities across multiple tasks, their development in the medical domain has been severely limited due to the lack of high-quality image-text data. To overcome this challenge, the authors propose a Diagnosis-Guided Bootstrapping strategy, which utilizes image and label information to construct a vision-language dataset. Based on the constructed dataset, they developed MedDr, a general foundation model capable of handling various medical data modalities such as radiology, pathology, dermatology, retinal imaging, and endoscopy. Additionally, the authors propose a simple Retrieval-Augmented Medical Diagnosis strategy to enhance the model's generalization capability. ### Main Contributions: 1. **Diagnosis-Guided Bootstrapping Strategy**: A novel data generation method is proposed, which generates high-quality medical reports by combining image and text information, ensuring that the generated data is both accurate and informative. 2. **MedDr Model**: A general medical foundation model is developed, capable of handling various medical data modalities and achieving state-of-the-art performance on multiple downstream tasks. 3. **Retrieval-Augmented Medical Diagnosis**: A retrieval-based strategy is proposed, which not only improves the model's prediction accuracy but also enhances its generalization capability. ### Problems Addressed: - **Data Scarcity**: By using the Diagnosis-Guided Bootstrapping strategy, more training data is generated from existing high-quality medical image classification datasets. - **Lack of Generalization Capability**: The retrieval-augmented strategy improves the model's diagnostic accuracy on rare or unseen diseases. ### Experimental Results: - MedDr performs excellently on tasks such as visual question answering, medical report generation, and medical image diagnosis, outperforming other existing models. - The retrieval-augmented strategy further enhances the model's performance, especially when dealing with rare diseases. In summary, this paper significantly improves the performance and reliability of large medical vision-language models in practical applications through innovative data generation and model optimization methods.