Abstract:Purpose: Deep learning-based networks have become increasingly popular in the field of medical image segmentation. The purpose of this research was to develop and optimize a new architecture for automatic segmentation of the prostate gland and normal organs in the pelvic, thoracic, and upper gastro-intestinal (GI) regions. Methods: We developed an architecture which combines a shifted-window (Swin) transformer with a convolutional U-Net. The network includes a parallel encoder, a cross-fusion block, and a CNN-based decoder to extract local and global information and merge related features on the same scale. A skip connection is applied between the cross-fusion block and decoder to integrate low-level semantic features. Attention gates (AGs) are integrated within the CNN to suppress features in image background regions. Our network is termed "SwinAttUNet." We optimized the architecture for automatic image segmentation. Training datasets consisted of planning-CT datasets from 300 prostate cancer patients from an institutional database and 100 CT datasets from a publicly available dataset (CT-ORG). Images were linearly interpolated and resampled to a spatial resolution of (1.0 × 1.0× 1.5) mm3 . A volume patch (192 × 192 × 96) was used for training and inference, and the dataset was split into training (75%), validation (10%), and test (15%) cohorts. Data augmentation transforms were applied consisting of random flip, rotation, and intensity scaling. The loss function comprised Dice and cross-entropy equally weighted and summed. We evaluated Dice coefficients (DSC), 95th percentile Hausdorff Distances (HD95), and Average Surface Distances (ASD) between results of our network and ground truth data. Results: SwinAttUNet, DSC values were 86.54 ± 1.21, 94.15 ± 1.17, and 87.15 ± 1.68% and HD95 values were 5.06 ± 1.42, 3.16 ± 0.93, and 5.54 ± 1.63 mm for the prostate, bladder, and rectum, respectively. Respective ASD values were 1.45 ± 0.57, 0.82 ± 0.12, and 1.42 ± 0.38 mm. For the lung, liver, kidneys and pelvic bones, respective DSC values were: 97.90 ± 0.80, 96.16 ± 0.76, 93.74 ± 2.25, and 89.31 ± 3.87%. Respective HD95 values were: 5.13 ± 4.11, 2.73 ± 1.19, 2.29 ± 1.47, and 5.31 ± 1.25 mm. Respective ASD values were: 1.88 ± 1.45, 1.78 ± 1.21, 0.71 ± 0.43, and 1.21 ± 1.11 mm. Our network outperformed several existing deep learning approaches using only attention-based convolutional or Transformer-based feature strategies, as detailed in the results section. Conclusions: We have demonstrated that our new architecture combining Transformer- and convolution-based features is able to better learn the local and global context for automatic segmentation of multi-organ, CT-based anatomy.

Automatic Detection of Spine Region Using Multiple Pseudo 3D U-Net Models with Weighted Average Voting and Attention Mechanisms

DeU-Net 2.0: Enhanced deformable U-Net for 3D cardiac cine MRI segmentation

Automatic Segmentation, Localization, and Identification of Vertebrae in 3D CT Images Using Cascaded Convolutional Neural Networks

Automatic Lumbar Spinal CT Image Segmentation with a Dual Densely Connected U-Net

A new architecture combining convolutional and transformer-based networks for automatic 3D multi-organ segmentation on CT images

Improved distinct bone segmentation in upper-body CT through multi-resolution networks

Automatic CT Segmentation from Bounding Box Annotations using Convolutional Neural Networks

RUnT: A Network Combining Residual U-Net and Transformer for Vertebral Edge Feature Fusion Constrained Spine CT Image Segmentation

Automatic segmentation of rectal tumors from MRI using multiscale densely connected convolutional neural network based on attention mechanism

Multi-Scale Supervised 3D U-Net for Kidneys and Kidney Tumor Segmentation

RAU-Net: U-Net network based on residual multi-scale fusion and attention skip layer for overall spine segmentation

Re-UNet: A Novel Multi-scale Reverse U-shaped Network Architecture for Low-dose CT Image Reconstruction

Atrous Residual Interconnected Encoder to Attention Decoder Framework for Vertebrae Segmentation via 3D Volumetric CT Images

mm3DSNet: multi-scale and multi-feedforward self-attention 3D segmentation network for CT scans of hepatobiliary ducts

Automated Muscle Segmentation from Clinical CT Using Bayesian U-Net for Personalized Musculoskeletal Modeling

Brain tumor feature extraction and edge enhancement algorithm based on U-Net network

Combining CNN and Hybrid Active Contours for Head and Neck Tumor Segmentation in CT and PET images

Dual-Domain Reconstruction Network Incorporating Multi-Level Wavelet Transform and Recurrent Convolution for Sparse View Computed Tomography Imaging

CT-Based Automatic Spine Segmentation Using Patch-Based Deep Learning

A Novel Deep Learning Pipeline for Vertebra Labeling and Segmentation of Spinal Computed Tomography Images