Abstract:Objective.Automatic mutli-organ segmentation from anotomical images is essential in disease diagnosis and treatment planning. The U-shaped neural network with encoder-decoder has achieved great success in various segmentation tasks. However, a pure convolutional neural network (CNN) is not suitable for modeling long-range relations due to limited receptive fields, and a pure transformer is not good at capturing pixel-level features.Approach.We propose a new hybrid network named MSCT-UNET which fuses CNN features with transformer features at multi-scale and introduces multi-task contrastive learning to improve the segmentation performance. Specifically, the multi-scale low-level features extracted from CNN are further encoded through several transformers to build hierarchical global contexts. Then the cross fusion block fuses the low-level and high-level features in different directions. The deep-fused features are flowed back to the CNN and transformer branch for the next scale fusion. We introduce multi-task contrastive learning including a self-supervised global contrast learning and a supervised local contrast learning into MSCT-UNET. We also make the decoder stronger by using a transformer to better restore the segmentation map.Results.Evaluation results on ACDC, Synapase and BraTS datasets demonstrate the improved performance over other methods compared. Ablation study results prove the effectiveness of our major innovations.Significance.The hybrid encoder of MSCT-UNET can capture multi-scale long-range dependencies and fine-grained detail features at the same time. The cross fusion block can fuse these features deeply. The multi-task contrastive learning of MSCT-UNET can strengthen the representation ability of the encoder and jointly optimize the networks. The source code is publicly available at:https://github.com/msctunet/MSCT_UNET.git.

Big Model and Small Model : Remote Modeling and Local Information Extraction Module for Medical Image Segmentation.

TF-Unet:An Automatic Cardiac MRI Image Segmentation Method

RFE-UNet: Remote Feature Exploration with Local Learning for Medical Image Segmentation

SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical Image Segmentation

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

A novel full-convolution UNet-transformer for medical image segmentation

Multiscale Transunet + + : Dense Hybrid U-Net with Transformer for Medical Image Segmentation

A Novel Deep Learning Model for Medical Image Segmentation with Convolutional Neural Network and Transformer

Context-aware and local-aware fusion with transformer for medical image segmentation

Medical Image Segmentation Using Dual Branch Networks with Embedded Attention Mechanism.

TransCUNet: UNet cross fused transformer for medical image segmentation

Sfe-Transunet: A Transformer-Based U-Net With Skipped Features Enhancer For Medical Image Segmentation

Multi-scale Neighborhood Attention Transformer on U-Net for Medical Image Segmentation.

DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation

A Lightweight Multi-Scale Multi-Angle Dynamic Interactive Transformer-CNN Fusion Model for 3D Medical Image Segmentation

Isc-Transunet: Medical Image Segmentation Network Based On The Integration Of Self-Attention And Convolution

MSCT-UNET: multi-scale contrastive transformer within U-shaped network for medical image segmentation

E-Transunet: Enhanced Transunet for Medical Image Segmentation

EG-TransUNet: a transformer-based U-Net with enhanced and guided models for biomedical image segmentation

From CNN to Transformer: A Review of Medical Image Segmentation Models

MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet