Abstract:GenerateCT, the first approach to generating 3D medical imaging conditioned on free-form medical text prompts, incorporates a text encoder and three key components: a novel causal vision transformer for encoding 3D CT volumes, a text-image transformer for aligning CT and text tokens, and a text-conditional super-resolution diffusion model. Without directly comparable methods in 3D medical imaging, we benchmarked GenerateCT against cutting-edge methods, demonstrating its superiority across all key metrics. Importantly, we evaluated GenerateCT's clinical applications in a multi-abnormality classification task. First, we established a baseline by training a multi-abnormality classifier on our real dataset. To further assess the model's generalization to external data and performance with unseen prompts in a zero-shot scenario, we employed an external set to train the classifier, setting an additional benchmark. We conducted two experiments in which we doubled the training datasets by synthesizing an equal number of volumes for each set using GenerateCT. The first experiment demonstrated an 11% improvement in the AP score when training the classifier jointly on real and generated volumes. The second experiment showed a 7% improvement when training on both real and generated volumes based on unseen prompts. Moreover, GenerateCT enables the scaling of synthetic training datasets to arbitrary sizes. As an example, we generated 100,000 3D CTs, fivefold the number in our real set, and trained the classifier exclusively on these synthetic CTs. Impressively, this classifier surpassed the performance of the one trained on all available real data by a margin of 8%. Last, domain experts evaluated the generated volumes, confirming a high degree of alignment with the text prompt. Access our code, model weights, training data, and generated data at <a class="link-external link-https" href="https://github.com/ibrahimethemhamamci/GenerateCT" rel="external noopener nofollow">this https URL</a>

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval

A Retrieval System For 3d Multi-Phase Contrast-Enhanced Ct Images Of Focal Liver Lesions Based On Combined Bags Of Visual Words And Texture Words

Multistage and Multi-features Medical Image Retrieval System

3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology

Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

Content-Based Image Retrieval for Multi-Class Volumetric Radiology Images: A Benchmark Study

Medical Image Retrieval System Using Multiple Features from 3d Rois

Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography

BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports

Brain CT Database for Content-Based Image Retrieval

MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs

Triplet Contents based Medical Image Retrieval System for Lung Nodules CT Images Retrieval and Recognition Application

MedMNIST v2 -- A large-scale lightweight benchmark for 2D and 3D biomedical image classification

MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes

CT Cadaveric dataset for Radiomics features stability assessment in lumbar vertebrae

Brain CT Image Database Building for Computer-Aided Diagnosis Using Content-Based Image Retrieval

TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines

AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking