Abstract:Background and Objective: Transformer, which is notable for its ability of global context modeling, has been used to remedy the shortcomings of Convolutional neural networks (CNN) and break its dominance in medical image segmentation. However, the self-attention module is both memory and computational inefficient, so many methods have to build their Transformer branch upon largely downsampled feature maps or adopt the tokenized image patches to fit their model into accessible GPUs. This patch-wise operation restricts the network in extracting pixel-level intrinsic structural or dependencies inside each patch, hurting the performance of pixel-level classification tasks. Methods: To tackle these issues, we propose a memory- and computation-efficient self-attention module to enable reasoning on relatively high-resolution features, promoting the efficiency of learning global information while effective grasping fine spatial details. Furthermore, we design a novel Multi-Branch Transformer (MultiTrans) architecture to provide hierarchical features for handling objects with variable shapes and sizes in medical images. By building four parallel Transformer branches on different levels of CNN, our hybrid network aggregates both multi-scale global contexts and multi-scale local features. Results: MultiTrans achieves the highest segmentation accuracy on three medical image datasets with different modalities: Synapse, ACDC and M&Ms. Compared to the Standard Self-Attention (SSA), the proposed Efficient Self-Attention (ESA) can largely reduce the training memory and computational complexity while even slightly improve the accuracy. Specifically, the training memory cost, FLOPs and Params of our ESA are 18.77%, 20.68% and 74.07% of the SSA. Conclusions: Experiments on three medical image datasets demonstrate the generality and robustness of the designed network. The ablation study shows the efficiency and effectiveness of our proposed ESA. Code is available at: https://github.com/Yanhua-Zhang/MultiTrans-extension .

TransDiffSeg: Transformer-Based Conditional Diffusion Segmentation Model for Abdominal Multi-Objective

MixFormer: a Mixed CNN-Transformer Backbone for Medical Image Segmentation

MedSegDiff-V2: Diffusion based Medical Image Segmentation with Transformer

TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model

MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model

BerDiff: Conditional Bernoulli Diffusion Model for Medical Image Segmentation

HiDiff: Hybrid Diffusion Framework for Medical Image Segmentation

Context-aware and local-aware fusion with transformer for medical image segmentation

MMViT-Seg: A Lightweight Transformer and CNN Fusion Network for COVID-19 Segmentation.

CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation

TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation

TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers

FTransCNN: Fusing Transformer and a CNN based on fuzzy logic for uncertain medical image segmentation

Multi-Level Global Context Cross Consistency Model for Semi-Supervised Ultrasound Image Segmentation with Diffusion Model

P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation

Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models

DuAT: Dual-Aggregation Transformer Network for Medical Image Segmentation

Medical Image Segmentation Using Squeeze-and-Expansion Transformers

MultiTrans: Multi-branch transformer network for medical image segmentation

Ambiguous Medical Image Segmentation using Diffusion Models

Enhancing Label-efficient Medical Image Segmentation with Text-guided Diffusion Models