Tube Masking-Based MAE Pre-Training for Three-Dimensional Lumbar Vertebrae Segmentation

Yang Liu,Jian Chen,Shuhua Jin,Shijie Wei,Jinjin Hai,Xin Qi,Yongli Li,Bin Yan
DOI: https://doi.org/10.1109/icsp62122.2024.10743935
2024-01-01
Abstract:QCT is commonly used for performing Bone mineral density (BMD) measurement in clinical practice, with professional doctors annotating the L1 vertebra region. However, the annotation consumes a significant amount of effort and time. Mask autoencoders (MAE) based pre-training can decrease the amount of labels and achieve a comparable performance for automated image segmentation. However, the random masking strategy of the MAE is not really suitable for 3D images. Given the considerable similarity among adjacent slices in 3D images, random masking could inadvertently cause the network to latch onto similar or identical features. Therefore, we propose a DMU-Net network that combines tube masking with MAE pre-training, and utilize UNetr network to finetune the pre-trained model on the L1 vertebrae segmentation task. This paper makes a comparative evaluation of the DMU-Net network against both conventional MAE pre-training with UNetr fine-tuning network and only UNetr for segmentation. The results demonstrate that DMU-Net outperforms other networks.
What problem does this paper attempt to address?