CLIP in Medical Imaging: A Comprehensive Survey

Zihao Zhao,Yuxiao Liu,Han Wu,Mei Wang,Yonghao Li,Sheng Wang,Lin Teng,Disheng Liu,Zhiming Cui,Qian Wang,Dinggang Shen

2024-08-10

Abstract:Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training paradigm, successfully introduces text supervision to vision models. It has shown promising results across various tasks, attributable to its generalizability and interpretability. The use of CLIP has recently gained increasing interest in the medical imaging domain, serving both as a pre-training paradigm for aligning medical vision and language, and as a critical component in diverse clinical tasks. With the aim of facilitating a deeper understanding of this promising direction, this survey offers an in-depth exploration of the CLIP paradigm within the domain of medical imaging, regarding both refined CLIP pre-training and CLIP-driven applications. In this study, We (1) start with a brief introduction to the fundamentals of CLIP methodology. (2) Then, we investigate the adaptation of CLIP pre-training in the medical domain, focusing on how to optimize CLIP given characteristics of medical images and reports. (3) Furthermore, we explore the practical utilization of CLIP pre-trained models in various tasks, including classification, dense prediction, and cross-modal tasks. (4) Finally, we discuss existing limitations of CLIP in the context of medical imaging and propose forward-looking directions to address the demands of medical imaging domain. We expect that this comprehensive survey will provide researchers in the field of medical image analysis with a holistic understanding of the CLIP paradigm and its potential implications. The project page can be found on <a class="link-external link-https" href="https://github.com/zhaozh10/Awesome-CLIP-in-Medical-Imaging" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper primarily aims to address the application and development of Contrastive Language-Image Pretraining (CLIP) in the field of medical imaging. Specifically, the objectives of the paper include the following aspects: 1. **Introduction to the basic principles of CLIP**: First, it briefly introduces the foundational knowledge of the CLIP method, which is a pretraining paradigm that learns interpretable visual representations through text supervision. 2. **Adapting CLIP pretraining to the field of medical imaging**: It explores how to optimize CLIP to suit the characteristics of medical imaging, particularly how to effectively pretrain on medical imaging datasets. 3. **Exploring CLIP-driven applications**: It discusses how to utilize the pretrained CLIP model to improve the performance of various clinical tasks, such as classification, dense prediction (e.g., segmentation), and cross-modal tasks. 4. **Discussing existing limitations and future directions**: It analyzes the current limitations of CLIP in the field of medical imaging and proposes forward-looking research directions to address these needs. The paper also mentions the growing trend of applying CLIP in the field of medical imaging and how it meets the healthcare sector's demand for interpretable artificial intelligence. Additionally, it compares other related review articles and emphasizes that the unique contribution of this paper lies in its comprehensive coverage of both technical details and clinical applications. In summary, this paper aims to provide researchers with a comprehensive review of the potential applications of CLIP in the field of medical imaging, while also pointing out the key challenges and development trends in this area.

CLIP in Medical Imaging: A Comprehensive Survey

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training

A Closer Look at the Explainability of Contrastive Language-Image Pre-training

Unified Medical Image-Text-Label Contrastive Learning With Continuous Prompt

Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis

CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioning

UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Iclip: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition

Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography

ProtoCLIP: Prototypical Contrastive Language Image Pretraining

Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography

Non-Contrastive Learning Meets Language-Image Pre-Training

DiffCLIP: Few-shot Language-driven Multimodal Classifier

Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation