Abstract:Most recent 3D medical image segmentation methods adopt convolutional neural networks (CNNs) that rely on deep feature representation and achieve adequate performance. However, due to the convolutional architectures having limited receptive fields, they cannot explicitly model the long-range dependencies in the medical image. Recently, Transformer can benefit from global dependencies using self-attention mechanisms and learn highly expressive representations. Some works were designed based on the Transformers, but the existing Transformers suffer from extreme computational and memories, and they cannot take full advantage of the powerful feature representations in 3D medical image segmentation. In this paper, we aim to connect the different resolution streams in parallel and propose a novel network, named Trans former based H igh R esolution Net work (TransHRNet), with an Effective Transformer (EffTrans) block, which has sufficient feature representation even at high feature resolutions . Given a 3D image, the encoder first utilizes CNN to extract the feature representations to capture the local information, and then the different feature maps are reshaped elaborately for tokens that are fed into each Transformer stream in parallel to learn the global information and repeatedly exchange the information across streams. Unfortunately, the proposed framework based on the standard Transformer needs a huge amount of computation, thus we introduce a deep and effective Transformer to deliver better performance with fewer parameters. The proposed TransHRNet is evaluated on the Multi-Atlas Labeling Beyond the Cranial Vault (BCV) dataset that consists of 11 major human organs and the Medical Segmentation Decathlon (MSD) dataset for brain tumor and spleen segmentation tasks. Experimental results show that it performs better than the convolutional and other related Transformer-based methods on the 3D multi-organ segmentation tasks. Code is available at https://github.com/duweidai/TransHRNet .

SuperFormer: Volumetric Transformer Architectures for MRI Super-Resolution

Rethinking Multi-Contrast MRI Super-Resolution: Rectangle-Window Cross-Attention Transformer and Arbitrary-Scale Upsampling

UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation

SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation

A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation

MTVNet: Mapping using Transformers for Volumes -- Network for Super-Resolution with Long-Range Interactions

Super Images -- A New 2D Perspective on 3D Medical Imaging Analysis

TransMRSR: Transformer-based Self-Distilled Generative Prior for Brain MRI Super-Resolution

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

3D Brainformer: 3D Fusion Transformer for Brain Tumor Segmentation

SVoRT: Iterative Transformer for Slice-to-Volume Registration in Fetal Brain MRI

Improved Super Resolution of MR Images Using CNNs and Vision Transformers

DeepVolume: Brain Structure and Spatial Connection-Aware Network for Brain MRI Super-Resolution

TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images

Multicontrast MRI Super-Resolution Via Transformer-Empowered Multiscale Contextual Matching and Aggregation.

TransBTSV2: Wider Instead of Deeper Transformer for Medical Image Segmentation

Cross-Modality High-Frequency Transformer for MR Image Super-Resolution

VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

Multi-Aperture Fusion of Transformer-Convolutional Network (MFTC-Net) for 3D Medical Image Segmentation and Visualization

3D Medical image segmentation using parallel transformers

Medical Transformer: Universal Brain Encoder for 3D MRI Analysis