Abstract:Multimodal models trained on large natural image-text pair datasets have exhibited astounding abilities in generating high-quality images. Medical imaging data is fundamentally different to natural images, and the language used to succinctly capture relevant details in medical data uses a different, narrow but semantically rich, domain-specific vocabulary. Not surprisingly, multi-modal models trained on natural image-text pairs do not tend to generalize well to the medical domain. Developing generative imaging models faithfully representing medical concepts while providing compositional diversity could mitigate the existing paucity of high-quality, annotated medical imaging datasets. In this work, we develop a strategy to overcome the large natural-medical distributional shift by adapting a pre-trained latent diffusion model on a corpus of publicly available chest x-rays (CXR) and their corresponding radiology (text) reports. We investigate the model's ability to generate high-fidelity, diverse synthetic CXR conditioned on text prompts. We assess the model outputs quantitatively using image quality metrics, and evaluate image quality and text-image alignment by human domain experts. We present evidence that the resulting model (RoentGen) is able to create visually convincing, diverse synthetic CXR images, and that the output can be controlled to a new extent by using free-form text prompts including radiology-specific language. Fine-tuning this model on a fixed training set and using it as a data augmentation method, we measure a 5% improvement of a classifier trained jointly on synthetic and real images, and a 3% improvement when trained on a larger but purely synthetic training set. Finally, we observe that this fine-tuning distills in-domain knowledge in the text-encoder and can improve its representation capabilities of certain diseases like pneumothorax by 25%.

Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques

A vision–language foundation model for the generation of realistic chest X-ray images

Foundation Models in Radiology: What, How, When, Why and Why Not

Synthetically Enhanced: Unveiling Synthetic Data's Potential in Medical Imaging Research

Foundation AI Model for Medical Image Segmentation

On the Challenges and Perspectives of Foundation Models for Medical Image Analysis

Cascaded Latent Diffusion Models for High-Resolution Chest X-ray Synthesis

MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification

Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains

RoentGen: Vision-Language Foundation Model for Chest X-ray Generation

Navigating Data Scarcity using Foundation Models: A Benchmark of Few-Shot and Zero-Shot Learning Approaches in Medical Imaging

When is a Foundation Model a Foundation Model

Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data

Foundation model for cancer imaging biomarkers

How Good Are Synthetic Medical Images? An Empirical Study with Lung Ultrasound

Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision

A Survey on Trustworthiness in Foundation Models for Medical Image Analysis

Are Natural Domain Foundation Models Useful for Medical Image Classification?

A Framework for Evaluating the Efficacy of Foundation Embedding Models in Healthcare

Advanced image generation for cancer using diffusion models

Spot the fake lungs: Generating Synthetic Medical Images using Neural Diffusion Models