CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images

Seowoo Lee,Jiwon Youn,Hyungjin Kim,Mansu Kim,Soon Ho Yoon

2024-01-14

Abstract:Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLAVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting. Results: The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.62 for seven major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports. Conclusion: This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts. CXR-LLAVA is available at <a class="link-external link-https" href="https://github.com/ECOFRI/CXR_LLAVA" rel="external noopener nofollow">this https URL</a>.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

This paper aims to develop an open - source multimodal large language model (CXR - LLAVA) for interpreting chest X - ray images (CXRs). The research takes advantage of the latest progress in large language models (LLMs), with the goal of replicating the image - interpreting skills of human radiologists, thereby improving diagnostic accuracy in clinical settings and reducing the workload of radiologists. Specifically, the main objectives of the research are: 1. **Develop a multimodal large language model**: By combining a visual encoder and a large language model, develop a multimodal large language model capable of interpreting chest X - ray images. 2. **Evaluate model performance**: Through internal and external test sets, evaluate the performance of the model in identifying major pathological findings and whether the generated radiology reports meet the standards of human radiologists. 3. **Explore the potential for autonomous reporting**: Explore the feasibility of the model automatically generating radiology reports without the intervention of human radiologists. The research was trained by collecting a large number of publicly available data sets and optimized the model through multiple steps, and finally demonstrated good performance in the test sets. Despite some limitations, this research shows the great potential of multimodal large language models in chest X - ray image interpretation.

CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images

SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation

WoLF: Wide-scope Large Language Model Framework for CXR Understanding

MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation

LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation

M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation

Utility of Multimodal Large Language Models in Analyzing Chest X-ray with Incomplete Contextual Information

Exploring Multimodal Large Language Models for Radiology Report Error-checking

Multi-modal large language models in radiology: principles, applications, and potential

LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound

D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions

On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

Large Language Models: A Guide for Radiologists

CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting

Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation

Evaluating Large Language Models for Radiology Natural Language Processing

Advancing radiology practice and research: harnessing the potential of large language models amidst imperfections

LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models

Constructing a Large Language Model to Generate Impressions from Findings in Radiology Reports

Large Language Models Diagnose Facial Deformity