CardiacSeg: Customized Pre-training Volumetric Transformer with Scaling Pyramid for 3D Cardiac Segmentation.

Zhiyu Ye,Hairong Zheng,Tong Zhang
DOI: https://doi.org/10.1007/978-3-031-52448-6_1
2024-01-01
Abstract:Congenital heart disease (CHD) is the most common type of birth defect and a leading cause of death worldwide. The volumetric segmentation of the whole heart anatomy serves as a basic step towards accurate diagnosis and treatment planning for CHD patients. Although deep learning segmentation networks can be powerful tools, it is still very challenging to apply them to CHD images due to the complex nature of the defect and the limited availability of training data and annotations. In this paper, we present CardiacSeg, a volumetric transformer for 3D cardiac image segmentation with masked image pre-training. Following the classic “U-shaped” encoder-decoder architecture, CardiacSeg is composed of a vision transformer (ViT) encoder, a scaling feature pyramid and a convolutional neural network decoder. Specifically, the scaling pyramid is generated solely from the output of the last layer of the encoder, and converts the single-scale feature map into a multi-scale representation, thereby enabling the decoder to effectively reconstruct the segmentation results. We evaluated our pre-trained ViT backbone and downstream segmentation network on the 3D Computed Tomography Image Dataset for CHD (ImageCHD) and the Multi-Modality Whole Heart Segmentation Challenge (MM-WHS) dataset. To further validate the few-shot learning ability, we conduct comparison experiments using a randomly sampled 10%-subset of the training data. Experimental results show that CardiacSeg outperforms five benchmark models, particularly in the few-shot learning scenario. The codes will be open-sourced to https://openi.pcl.ac.cn/OpenMedIA/CardiacSeg .
What problem does this paper attempt to address?