Abstract:Purpose: Accurate segmentation of cardiac structures on coronary CT angiography (CCTA) images is crucial for the morphological analysis, measurement, and functional evaluation. In this study, we achieve accurate automatic segmentation of cardiac structures on CCTA image by adopting an innovative deep learning method based on visual attention mechanism and transformer network, and its practical application value is discussed. Methods: We developed a dual-input deep learning network based on visual saliency and transformer (VST), which consists of self-attention mechanism for cardiac structures segmentation. Sixty patients' CCTA subjects were randomly selected as a development set, which were manual marked by an experienced technician. The proposed vision attention and transformer mode was trained on the patients CCTA images, with a manual contour-derived binary mask used as the learning-based target. We also used the deep supervision strategy by adding auxiliary losses. The loss function of our model was the sum of the Dice loss and cross-entropy loss. To quantitatively evaluate the segmentation results, we calculated the Dice similarity coefficient (DSC) and Hausdorff distance (HD). Meanwhile, we compare the volume of automatic segmentation and manual segmentation to analyze whether there is statistical difference. Results: Fivefold cross-validation was used to benchmark the segmentation method. The results showed the left ventricular myocardium (LVM, DSC = 0.87), the left ventricular (LV, DSC = 0.94), the left atrial (LA, DSC = 0.90), the right ventricular (RV, DSC = 0.92), the right atrial (RA, DSC = 0.91), and the aortic (AO, DSC = 0.96). The average DSC was 0.92, and HD was 7.2 ± 2.1 mm. In volume comparison, except LVM and LA (p < 0.05), there was no significant statistical difference in other structures. Proposed method for structural segmentation fit well with the true profile of the cardiac substructure, and the model prediction results closed to the manual annotation. Conclusions: The adoption of the dual-input and transformer architecture based on visual saliency has high sensitivity and specificity to cardiac structures segmentation, which can obviously improve the accuracy of automatic substructure segmentation. This is of gr.

MSMHSA-DeepLab V3+: An Effective Multi-Scale, Multi-Head Self-Attention Network for Dual-Modality Cardiac Medical Image Segmentation

Enhancing Cardiac MRI Segmentation via Classifier-Guided Two-Stage Network and All-Slice Information Fusion Transformer

Two-Stage CNN Whole Heart Segmentation Combining Image Enhanced Attention Mechanism and Metric Classification

Automated cardiac segmentation of cross-modal medical images using unsupervised multi-domain adaptation and spatial neural attention structure

CardSegNet: An adaptive hybrid CNN-vision transformer model for heart region segmentation in cardiac MRI

A cascaded framework with cross-modality transfer learning for whole heart segmentation

Pseudo-3D Network for Multi-sequence Cardiac MR Segmentation

Transforming Heart Chamber Imaging: Self-Supervised Learning for Whole Heart Reconstruction and Segmentation

Multiple Attention Fully Convolutional Network for Automated Ventricle Segmentation in Cardiac Magnetic Resonance Imaging

Regional perception and multi-scale feature fusion network for cardiac segmentation

Multiscale attention guided U-Net architecture for cardiac segmentation in short-axis MRI images

Deep Learning Based Multi-modal Cardiac MR Image Segmentation

The auto segmentation for cardiac structures using a dual-input deep learning network based on vision saliency and transformer

Multi-Planar Deep Segmentation Networks for Cardiac Substructures from MRI and CT

MMTLNet: Multi-Modality Transfer Learning Network with adversarial training for 3D whole heart segmentation

MSA$^2$Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation

An Improved Combination of Faster R-CNN and U-Net Network for Accurate Multi-Modality Whole Heart Segmentation

MS-TCNet: An effective Transformer–CNN combined network using multi-scale feature learning for 3D medical image segmentation

BMCS-Net: A Bi-directional multi-scale cascaded segmentation network based on transformer-guided feature Aggregation for medical images

A Two-Stage Fully Automatic Segmentation Scheme Using Both 2D and 3D U-Net for Multi-sequence Cardiac MR.

A Hybrid Enhanced Attention Transformer Network for Medical Ultrasound Image Segmentation