Abstract:UNet has been highly successful in various medical image segmentation tasks, but the restricted field of perception of convolutional operations has led to the lack of UNet’s ability to explicitly model global context information. Vision Transformer captures global relevance through self-attention (SA), thus alleviating the problem of perceived wild locality in convolution neural network (CNN) architectures. However, traditional Transformer typically by means of SA with high computational complexity, and the fusion mechanism is static MLP mode, which is not efficient enough. In addition, the current segmentation methods usually perform simple feature fusion on the decoder side of the U-shaped architecture, which cannot meet the potential demand for important features when generating predictive maps. To solve these problems, we propose the E-TUNet network. On the one hand, we designed the Enhanced Transformer as the encoder by introducing EMSA and DynaMixer MLP. The Enhanced Transformer has high computational efficiency and dynamic mixing weights, which alleviates the problem of single static fusion mechanism. On the other hand, we introduce G-L MLP block with global-local space interaction capability to form hybrid cascaded upsampler for importance computation and matching of decoder side features. The hybrid cascaded upsampler has stronger information representation capabilities and effectively combines CNN and MLP to capture local and global dependencies. We demonstrate the effectiveness of our E-TUNet on two different public available datasets. Extensive experiments have shown that our method is highly competitive compared to other methods. In particular, on publicly available datasets (Synapse and ACDC), the mean DSC (%) is 82.15 and 91.12, respectively. HD95 (mm) is 17.89 on the Synapse dataset. E-TUNet has achieved significant performance improvement in multi-organ segmentation tasks, reaching a advanced level.

LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation

Mixed Transformer U-Net for Medical Image Segmentation

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

A More Design-Flexible Medical Transformer for Volumetric Image Segmentation.

Sfe-Transunet: A Transformer-Based U-Net With Skipped Features Enhancer For Medical Image Segmentation

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

EViT-Unet: U-Net Like Efficient Vision Transformer for Medical Image Segmentation on Mobile and Edge Devices

MobileUtr: Revisiting the relationship between light-weight CNN and Transformer for efficient medical image segmentation

LATrans-Unet: Improving CNN-Transformer with Location Adaptive for Medical Image Segmentation.

Enhanced Transformer Encoder and Hybrid Cascaded Upsampler for Medical Image Segmentation.

FDR-TransUNet: A novel encoder-decoder architecture with vision transformer for improved medical image segmentation

DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation

Trans-UNeter: A new Decoder of TransUNet for Medical Image Segmentation.

UNETR: Transformers for 3D Medical Image Segmentation

CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation

TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers

SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical Image Segmentation

TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation.

FTUNet: A Feature-Enhanced Network for Medical Image Segmentation Based on the Combination of U-Shaped Network and Vision Transformer

EG-TransUNet: a transformer-based U-Net with enhanced and guided models for biomedical image segmentation

Big Model and Small Model : Remote Modeling and Local Information Extraction Module for Medical Image Segmentation.