Abstract:Purpose: Deep learning-based networks have become increasingly popular in the field of medical image segmentation. The purpose of this research was to develop and optimize a new architecture for automatic segmentation of the prostate gland and normal organs in the pelvic, thoracic, and upper gastro-intestinal (GI) regions. Methods: We developed an architecture which combines a shifted-window (Swin) transformer with a convolutional U-Net. The network includes a parallel encoder, a cross-fusion block, and a CNN-based decoder to extract local and global information and merge related features on the same scale. A skip connection is applied between the cross-fusion block and decoder to integrate low-level semantic features. Attention gates (AGs) are integrated within the CNN to suppress features in image background regions. Our network is termed "SwinAttUNet." We optimized the architecture for automatic image segmentation. Training datasets consisted of planning-CT datasets from 300 prostate cancer patients from an institutional database and 100 CT datasets from a publicly available dataset (CT-ORG). Images were linearly interpolated and resampled to a spatial resolution of (1.0 × 1.0× 1.5) mm3 . A volume patch (192 × 192 × 96) was used for training and inference, and the dataset was split into training (75%), validation (10%), and test (15%) cohorts. Data augmentation transforms were applied consisting of random flip, rotation, and intensity scaling. The loss function comprised Dice and cross-entropy equally weighted and summed. We evaluated Dice coefficients (DSC), 95th percentile Hausdorff Distances (HD95), and Average Surface Distances (ASD) between results of our network and ground truth data. Results: SwinAttUNet, DSC values were 86.54 ± 1.21, 94.15 ± 1.17, and 87.15 ± 1.68% and HD95 values were 5.06 ± 1.42, 3.16 ± 0.93, and 5.54 ± 1.63 mm for the prostate, bladder, and rectum, respectively. Respective ASD values were 1.45 ± 0.57, 0.82 ± 0.12, and 1.42 ± 0.38 mm. For the lung, liver, kidneys and pelvic bones, respective DSC values were: 97.90 ± 0.80, 96.16 ± 0.76, 93.74 ± 2.25, and 89.31 ± 3.87%. Respective HD95 values were: 5.13 ± 4.11, 2.73 ± 1.19, 2.29 ± 1.47, and 5.31 ± 1.25 mm. Respective ASD values were: 1.88 ± 1.45, 1.78 ± 1.21, 0.71 ± 0.43, and 1.21 ± 1.11 mm. Our network outperformed several existing deep learning approaches using only attention-based convolutional or Transformer-based feature strategies, as detailed in the results section. Conclusions: We have demonstrated that our new architecture combining Transformer- and convolution-based features is able to better learn the local and global context for automatic segmentation of multi-organ, CT-based anatomy.

A new architecture combining convolutional and transformer-based networks for automatic 3D multi-organ segmentation on CT images

Multi-organ Segmentation in Pelvic CT Images with CT-based Synthetic MRI

Abdominal multi-organ segmentation in CT using Swinunter

Male Pelvic Multi-Organ Segmentation Using V-transformer Network.

Dual encoder network with transformer-CNN for multi-organ segmentation

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Segmentation of the prostate and organs at risk in male pelvic CT images using deep learning

Multi-organ segmentation network for abdominal CT images based on spatial attention and deformable convolution

Fully-automated multi-organ segmentation tool applicable to both non-contrast and post-contrast abdominal CT: deep learning algorithm developed using dual-energy CT images

CIS-UNet: Multi-Class Segmentation of the Aorta in Computed Tomography Angiography via Context-Aware Shifted Window Self-Attention

Swin-TransUper: Swin Transformer-based UperNet for medical image segmentation

CT-Net: Asymmetric compound branch Transformer for medical image segmentation

CT Male Pelvic Organ Segmentation Using Fully Convolutional Networks with Boundary Sensitive Representation

A Multi-Center Study of Ultrasound Images using a Fully Automated Segmentation Architecture

Multi-organ segmentation of abdominal structures from non-contrast and contrast enhanced CT images

Multi-dimension unified Swin Transformer for 3D Lesion Segmentation in Multiple Anatomical Locations

A 2D dilated residual U-Net for multi-organ segmentation in thoracic CT

mm3DSNet: multi-scale and multi-feedforward self-attention 3D segmentation network for CT scans of hepatobiliary ducts

Automatic segmentation of esophageal cancer, metastatic lymph nodes and their adjacent structures in CTA images based on the UperNet Swin network

Nonylphenol, Octyphenol, and Bisphenol A in Groundwaters as a Result of Agronomic Practices

Automatic Segmentation of Head-Neck Organs by Multi-mode CNNs for Radiation Therapy