Abstract:Scarcity of labels for medical images is a significant barrier for training representation learning approaches based on deep neural networks. This limitation is also present when using imaging data collected during routine clinical care stored in picture archiving communication systems (PACS), as these data rarely have attached the high-quality labels required for medical image computing tasks. However, medical images extracted from PACS are commonly coupled with descriptive radiology reports that contain significant information and could be leveraged to pre-train imaging models, which could serve as starting points for further task-specific fine-tuning. In this work, we perform a head-to-head comparison of three different self-supervised strategies to pre-train the same imaging model on 3D brain computed tomography angiogram (CTA) images, with large vessel occlusion (LVO) detection as the downstream task. These strategies evaluate two natural language processing (NLP) approaches, one to extract 100 explicit radiology concepts (Rad-SpatialNet) and the other to create general-purpose radiology reports embeddings (DistilBERT). In addition, we experiment with learning radiology concepts directly or by using a recent self-supervised learning approach (CLIP) that learns by ranking the distance between language and image vector embeddings. The LVO detection task was selected because it requires 3D imaging data, is clinically important, and requires the algorithm to learn outputs not explicitly stated in the radiology report. Pre-training was performed on an unlabeled dataset containing 1,542 3D CTA - reports pairs. The downstream task was tested on a labeled dataset of 402 subjects for LVO. We find that the pre-training performed with CLIP-based strategies improve the performance of the imaging model to detect LVO compared to a model trained only on the labeled data. The best performance was achieved by pre-training using the explicit radiology concepts and CLIP strategy.

Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging

Improving Medical Vision-Language Contrastive Pretraining with Semantics-aware Triage

Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation

Contrastive self-supervised learning from 100 million medical images with optional supervision

Contrastive Learning of Medical Visual Representations from Paired Images and Text

Less is More: Selective Reduction of CT Data for Self-Supervised Pre-Training of Deep Learning Models with Contrastive Learning Improves Downstream Classification Performance

Self-supervised pre-training with contrastive and masked autoencoder methods for dealing with small datasets in deep learning for medical imaging

Learning Generalized Medical Image Representations through Image-Graph Contrastive Pretraining

Resource and data efficient self supervised learning

Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging

MoCo-CXR: MoCo Pretraining Improves Representation and Transferability of Chest X-ray Models

MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report

Exploring a Universal Training Method for Medical Image Classification.

Why does my medical AI look at pictures of birds? Exploring the efficacy of transfer learning across domain boundaries

SELF-SUPERVISED LEARNING WITH RADIOLOGY REPORTS, A COMPARATIVE ANALYSIS OF STRATEGIES FOR LARGE VESSEL OCCLUSION AND BRAIN CTA IMAGES

MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning

Enhancing Biomedical Multi-modal Representation Learning with Multi-scale Pre-training and Perturbed Report Discrimination

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text