Abstract:Medical image segmentation is a crucial topic in medical image processing. Accurately segmenting brain tumor regions from multimodal MRI scans is essential for clinical diagnosis and survival prediction. However, similar intensity distributions, variable tumor shapes, and fuzzy boundaries pose severe challenges for brain tumor segmentation. Traditional segmentation networks based on UNet struggle to establish explicit long-range dependencies from the feature space due to the limitations of the CNN receptive field. This is particularly crucial for dense prediction tasks such as brain tumor segmentation. Recent works have incorporated the powerful global modeling capability of Transformer into UNet to achieve more precise segmentation results. Nevertheless, these methods encounter some issues: (1) the global information is often modeled by simply stacking Transformer layers for a specific module, resulting in high computational complexity and underutilization of the potential of the UNet architecture; (2) the rich boundary information of tumor subregions in multi-scale features is often overlooked. Motivated by these challenges, we propose an advanced fusion of Transformer with UNet by reexamining the core three parts (encoder, bottleneck, and skip connections). Firstly, we introduce a CNN-Transformer module in the encoder to replace the traditional CNN module, enabling the capture of deep spatial dependencies from input images. To address high-level semantic information, we incorporate a computationally efficient spatial-channel attention layer in the bottleneck for global interaction, highlighting important semantic features from the encoder path output. For irregular lesions, we fuse the multi-scale features from the encoder output and the decoder features in the skip connections by calculating cross-attention. This adaptive querying of valuable information from multi-scale features enhances the boundary localization ability of the decoder path and suppresses redundant features with low correlation. Compared to existing methods, our model further enhances the learning capacity of the overall UNet architecture while maintaining low computational complexity. Experimental results on the BraTS2018 and BraTS2020 datasets for brain tumor segmentation tasks demonstrate that our model achieves comparable or superior results compared to recent CNN or Transformer-based models. The average DSC and HD95 on the two datasets are 0.854, 6.688, and 0.862, 5.455 respectively. At the same time, our model achieves optimal segmentation of Enhancing tumors, showcasing the effectiveness of our method. Our code will be made publicly available at https://github.com/wzhangck/ETUnet .

WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion

WiTUnet: A U-shaped architecture integrating CNN and Transformer for improved feature alignment and local information fusion

TransUNetCD: A Hybrid Transformer Network for Change Detection in Optical Remote-Sensing Images

Re-UNet: a novel multi-scale reverse U-shape network architecture for low-dose CT image reconstruction

Re-UNet: A Novel Multi-scale Reverse U-shaped Network Architecture for Low-dose CT Image Reconstruction

FCTrans UNet: A Hybrid CNN and Transformer Model for Medical Image Segmentations

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

MCV-UNet: a modified convolution & transformer hybrid encoder-decoder network with multi-scale information fusion for ultrasound image semantic segmentation

ConvWin-UNet: UNet-like hierarchical vision Transformer combined with convolution for medical image segmentation.

TransCT: Dual-path Transformer for Low Dose Computed Tomography

ITUnet: Integration Of Transformers And Unet For Organs-At-Risk Segmentation

AD-DUNet: A dual-branch encoder approach by combining axial Transformer with cascaded dilated convolutions for liver and hepatic tumor segmentation

iU-Net: a hybrid structured network with a novel feature fusion approach for medical image segmentation

ACF-TransUNet: Attention-based Coarse-Fine Transformer U-Net for Automatic Liver Tumor Segmentation in CT Images.

D-UNet: a dimension-fusion U shape network for chronic stroke lesion segmentation

FTUNet: A Feature-Enhanced Network for Medical Image Segmentation Based on the Combination of U-Shaped Network and Vision Transformer

ETUNet:Exploring efficient transformer enhanced UNet for 3D brain tumor segmentation

CT-Net: Asymmetric compound branch Transformer for medical image segmentation

SW-UNet: a U-Net fusing sliding window transformer block with CNN for segmentation of lung nodules

HCT-Unet: multi-target medical image segmentation via a hybrid CNN-transformer Unet incorporating multi-axis gated multi-layer perceptron

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers