BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Yang Zhou,Tan Li Hui Faith,Yanyu Xu,Sicong Leng,Xinxing Xu,Yong Liu,Rick Siow Mong Goh

2024-10-29

Abstract:Medical Vision-Language Pretraining (MedVLP) shows promise in learning generalizable and transferable visual representations from paired and unpaired medical images and reports. MedVLP can provide useful features to downstream tasks and facilitate adapting task-specific models to new setups using fewer examples. However, existing MedVLP methods often differ in terms of datasets, preprocessing, and finetuning implementations. This pose great challenges in evaluating how well a MedVLP method generalizes to various clinically-relevant tasks due to the lack of unified, standardized, and comprehensive benchmark. To fill this gap, we propose BenchX, a unified benchmark framework that enables head-to-head comparison and systematical analysis between MedVLP methods using public chest X-ray datasets. Specifically, BenchX is composed of three components: 1) Comprehensive datasets covering nine datasets and four medical tasks; 2) Benchmark suites to standardize data preprocessing, train-test splits, and parameter selection; 3) Unified finetuning protocols that accommodate heterogeneous MedVLP methods for consistent task adaptation in classification, segmentation, and report generation, respectively. Utilizing BenchX, we establish baselines for nine state-of-the-art MedVLP methods and found that the performance of some early MedVLP methods can be enhanced to surpass more recent ones, prompting a revisiting of the developments and conclusions from prior works in MedVLP. Our code are available at <a class="link-external link-https" href="https://github.com/yangzhou12/BenchX" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the issue of the lack of a unified, standardized, and comprehensive benchmark framework in medical vision-language pre-training (MedVLP) methods, which makes it difficult to fairly and systematically compare the performance of different methods. Specifically, existing MedVLP methods exhibit significant differences in dataset selection, preprocessing methods, and fine-tuning implementations, making it very challenging to evaluate the generalization ability of these methods across various clinically relevant tasks. To fill this gap, the authors propose BenchX, a unified benchmark framework designed to enable head-to-head comparison and systematic analysis of different MedVLP methods using a common chest X-ray dataset. The BenchX framework includes three main components: 1. **Comprehensive Datasets**: Covering 9 datasets and 4 medical tasks, ensuring diversity and representativeness of the data. 2. **Benchmark Suite**: Standardized data preprocessing, train-test splits, and parameter selection, reducing the impact of inconsistent experimental setups on MedVLP performance. 3. **Unified Fine-tuning Protocol**: Adapting to different types of MedVLP methods, ensuring consistency in tasks such as classification, segmentation, and report generation. Through BenchX, the authors established baselines for 9 state-of-the-art MedVLP methods and found that some early MedVLP methods can significantly improve performance under proper configurations, even surpassing more recent methods. This suggests the need to revisit and reassess the existing developments and conclusions in the MedVLP field.

BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification

Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge

Unified Medical Image Pre-training in Language-Guided Common Semantic Space

MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

μ-Bench: A Vision-Language Benchmark for Microscopy Understanding

Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias

MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models

VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models

CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training

A Large-scale Medical Visual Task Adaptation Benchmark

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI