Abstract:Background and Objectives: Transformer is a model relying entirely on self-attention which has a wide range of applications in the field of natural language processing. Researchers are beginning to focus on the transformer in medical images due to the past few years having seen the rapid development of transformer in many vision fields such as vision transformer (ViT) and Swin transformer. In the last year, moreover, many scholars have applied transformer to medical image segmentation and have achieved good segmentation results. Transformer-based medical image segmentation has become one of the hot spots in this field. The purpose of this work is to categorize and review the segmentation methods of Unet-based transformer and other model based transformer in medical images. Methods: This paper summarizes the transformer-based segmentation models in the abdominal organs, heart, brain, and lung based on the relevant studies in the last two years. We described and analyzed the model structure including the position of the transformer in the model, the changes made by scholars to transformer and the combination with the model. In this work, the segmentation performance results based on Dice evaluation metrics are compared. Results: Through the help of 93 references, we find that researchers prefer to use Unet-based transformer models than others and place the transformer structure in the encoder. These new models improve the segmentation performance compared with U-Net and other segmentation models. However, there are not many related studies on lungs, which points to a new way for future research. Conclusions: We found that the combination of U-Net and transformer is more suitable for segmentation. In future research on medical image segmentation, researchers can use a suitable transformer-based segmentation method or modify the transformer structure according to the segmentation requirements. We hope that this work will be helpful for improvements of the transformer to solve clinical problems in medicine.

Transformer-based heart organ segmentation using a novel axial attention and fusion mechanism

TF-Unet:An Automatic Cardiac MRI Image Segmentation Method

Mixed Transformer U-Net for Medical Image Segmentation

AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation.

Vision Transformers increase efficiency of 3D cardiac CT multi-label segmentation

H2MaT-Unet:Hierarchical hybrid multi-axis transformer based Unet for medical image segmentation

Axial Attention Transformer Networks: A New Frontier in Breast Cancer Detection

TSCA-Net: Transformer based spatial-channel attention segmentation network for medical images

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

Transformers in medical image segmentation: A review

U-Net Transformer: Self and Cross Attention for Medical Image Segmentation

UNETR: Transformers for 3D Medical Image Segmentation

Dual encoder network with transformer-CNN for multi-organ segmentation

ATTransUNet: an Enhanced Hybrid Transformer Architecture for Ultrasound and Histopathology Image Segmentation

UNesT: Local Spatial Representation Learning with Hierarchical Transformer for Efficient Medical Segmentation

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

A Multi-Scale Cross-Fusion Medical Image Segmentation Network Based on Dual-Attention Mechanism Transformer

A novel full-convolution UNet-transformer for medical image segmentation

DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation

A Hybrid Enhanced Attention Transformer Network for Medical Ultrasound Image Segmentation