Self Pre-training with Single-Scale Adapter for Left Atrial Segmentation.
Can Tu,Ziyan Huang,Zhongying Deng,Yuncheng Yang,Chenglong Ma,Junjun He,Jin Ye,Haoyu Wang,Xiaowei Ding
DOI: https://doi.org/10.1007/978-3-031-31778-1_3
2022-01-01
Abstract:Accurate Left Atrial (LA) segmentation from Late Gadolinium Enhancement Magnetic Resonance Imaging (LGE MRI) is fundamental to the diagnosis of Atrial Fibrillation (AF). Previous approaches tended to solve this problem by refining network architecture to leverage spatial priors in medical imaging. However, the priors modeling can hardly be achieved due to low image quality and various shapes of LA. In this paper, we try to learn the priors from generation. The motivation is simple: if a model can generate or recover image content well, it possibly has learned the priors well. With the priors built in, such a model can better segment LA. Specifically, we investigate the self pre-training paradigm, i.e., models are pre-trained and fine-tuned on the same LGE-MRI dataset, based on Mask Autoencoder (MAE). In the pre-training stage, we utilize Vision Transformers (ViT) based auto-encoders to perform the pretext task of reconstructing the original MRI images from only partial patches, where the ViT encoder is encouraged to learn contextual information as priors by aggregating global information to recover the contents in masked patches. In the fine-tuning process, we further propose an single-scale adaptor for downstream task. The adapter first has different branches with different numbers of upsampling blocks to remedy the plain, non-hierarchical property of the ViT. This can better adapt ViT to dense prediction task. Then, it constructs a feature pyramid directly from the single-scale feature map of ViT using the multi-scale features from different branches. Finally, the adapter incorporates a decoder to predict the segmentation results based on the feature pyramid. The proposed model (called ViTUNet) outperforms baseline trained from scratch and widely used nnUNet model. The final trained model shows a validation score of 0.89013, 1.70567 and 17.12375 for Dice coefficient, ASD and HD metric, respectively.