MediSyn: Text-Guided Diffusion Models for Broad Medical 2D and 3D Image Synthesis

Joseph Cho,Cyril Zakka,Dhamanpreet Kaur,Rohan Shad,Ross Wightman,Akshay Chaudhari,William Hiesinger
2024-07-10
Abstract:Diffusion models have recently gained significant traction due to their ability to generate high-fidelity and diverse images and videos conditioned on text prompts. In medicine, this application promises to address the critical challenge of data scarcity, a consequence of barriers in data sharing, stringent patient privacy regulations, and disparities in patient population and demographics. By generating realistic and varying medical 2D and 3D images, these models offer a rich, privacy-respecting resource for algorithmic training and research. To this end, we introduce MediSyn, a pair of instruction-tuned text-guided latent diffusion models with the ability to generate high-fidelity and diverse medical 2D and 3D images across specialties and modalities. Through established metrics, we show significant improvement in broad medical image and video synthesis guided by text prompts.
Computer Vision and Pattern Recognition,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses the issue of data scarcity in the medical field. Specifically: 1. **Data Scarcity Issue**: In the medical field, the lack of high-quality annotated datasets is a fundamental obstacle. Although the healthcare industry generates a large amount of data, annotating these datasets requires a significant amount of time and expertise. Additionally, the acquisition and sharing of medical data are subject to strict legal and privacy restrictions, which further exacerbates the problem of data scarcity. 2. **Class Imbalance Issue**: Medical data often reflect the disease distribution of a particular population, leading to significant class imbalance in the datasets. Moreover, certain populations may be underrepresented in medical data, potentially causing biases in clinical decision support systems and limiting their generalizability to new environments and populations. To address these issues, the researchers introduced a text-guided Latent Diffusion Model (LDM) called MediSyn, which can generate high-fidelity and diverse 2D and 3D medical images. By training on a large-scale medical image-text paired dataset, MediSyn can synthesize high-quality medical images and videos across multiple medical specialties and modalities. This provides a rich and privacy-preserving data resource for algorithm training and research. Experimental results show that MediSyn achieves significant performance improvements in various medical image and video synthesis tasks.