Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation

Pengfei Gu,Yejia Zhang,Huimin Li,Chaoli Wang,Danny Z. Chen
2024-07-16
Abstract:Masked Autoencoders (MAEs) have been shown to be effective in pre-training Vision Transformers (ViTs) for natural and medical image analysis problems. By reconstructing missing pixel/voxel information in visible patches, a ViT encoder can aggregate contextual information for downstream tasks. But, existing MAE pre-training methods, which were specifically developed with the ViT architecture, lack the ability to capture geometric shape and spatial information, which is critical for medical image segmentation tasks. In this paper, we propose a novel extension of known MAEs for self pre-training (i.e., models pre-trained on the same target dataset) for 3D medical image segmentation. (1) We propose a new topological loss to preserve geometric shape information by computing topological signatures of both the input and reconstructed volumes, learning geometric shape information. (2) We introduce a pre-text task that predicts the positions of the centers and eight corners of 3D crops, enabling the MAE to aggregate spatial information. (3) We extend the MAE pre-training strategy to a hybrid state-of-the-art (SOTA) medical image segmentation architecture and co-pretrain it alongside the ViT. (4) We develop a fine-tuned model for downstream segmentation tasks by complementing the pre-trained ViT encoder with our pre-trained SOTA model. Extensive experiments on five public 3D segmentation datasets show the effectiveness of our new approach.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to address several key challenges in 3D medical image segmentation tasks: 1. **Capturing geometric shape information**: Existing Masked Autoencoders (MAEs) methods cannot capture geometric shape information well during the pre - training process, which is crucial in medical image segmentation. For example, existing methods often ignore the shape information of the overall object when reconstructing missing pixel/voxel information (as shown in Figure 1). 2. **Exploring global spatial information**: Existing MAE methods mainly focus on reconstructing information from locally occluded sub - volumes and may overlook the overall global context information of the target object. 3. **Compatibility with other common medical image segmentation architectures**: Existing MAE pre - training strategies are mainly developed based on the Vision Transformer (ViT) architecture, limiting their adaptability and effectiveness in other architectures (such as those based on Convolutional Neural Networks (CNN) or hybrid models). To solve these problems, the authors propose a new extended MAE method for self - supervised pre - training of 3D medical image segmentation. Specifically, this method includes the following aspects: 1. **Topological loss**: Extract geometric shape information by calculating the topological features of the input and reconstructed volumes. This method uses cubical complexes to calculate topological signatures and adopts the optimal transport distance (2 - Wasserstein distance) to define a new topological loss. 2. **Pre - text task**: Introduce a pre - text task to predict the positions of the center and eight corner points of the 3D cropped region, enabling the model to aggregate spatial information. 3. **Extension to hybrid architectures**: Extend the MAE pre - training strategy to hybrid state - of - the - art (SOTA) medical image segmentation architectures (such as UNETR++), and co - pre - train with ViT. 4. **Fine - tuning the model for downstream tasks**: Build a fine - tuning model by combining the pre - trained ViT encoder and the UNETR++ model to improve the performance of downstream segmentation tasks. Through these improvements, the authors conducted extensive experiments on five publicly available 3D segmentation datasets to verify the effectiveness of the new method.