Abstract:Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Because existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. And they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for volumetric medical image segmentation pre-training. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 12 volumetric medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. {\itshape i.e.} Our MACL achieves a superior performance with more precise predictions from visualization figures and 2.28\%, 1.32\%, 1.62\% and 1.60\% Average Dice higher than previous best results on CHD, MMWHS, CHAOS and AMOS, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones. Our code will be available at <a class="link-external link-https" href="https://github.com/stevezs315/MACL" rel="external noopener nofollow">this https URL</a>.

Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training

Parameter-Efficient Transfer Learning for Medical Visual Question Answering

VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks

MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks

Improving Medical Vision-Language Contrastive Pretraining with Semantics-aware Triage

Contrastive Learning of Medical Visual Representations from Paired Images and Text

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Medical Vision-Language Pre-Training for Brain Abnormalities

CAVL: Learning Contrastive and Adaptive Representations of Vision and Language

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity

Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training

Time-, Memory- and Parameter-Efficient Visual Adaptation

Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training

Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training

Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks

A Unified Visual Information Preservation Framework for Self-supervised Pre-training in Medical Image Analysis

Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias