Multiscale Progressive Text Prompt Network for Medical Image Segmentation

Xianjun Han,Qianqian Chen,Zhaoyang Xie,Xuejun Li,Hongyu Yang
2023-07-01
Abstract:The accurate segmentation of medical images is a crucial step in obtaining reliable morphological statistics. However, training a deep neural network for this task requires a large amount of labeled data to ensure high-accuracy results. To address this issue, we propose using progressive text prompts as prior knowledge to guide the segmentation process. Our model consists of two stages. In the first stage, we perform contrastive learning on natural images to pretrain a powerful prior prompt encoder (PPE). This PPE leverages text prior prompts to generate multimodality features. In the second stage, medical image and text prior prompts are sent into the PPE inherited from the first stage to achieve the downstream medical image segmentation task. A multiscale feature fusion block (MSFF) combines the features from the PPE to produce multiscale multimodality features. These two progressive features not only bridge the semantic gap but also improve prediction accuracy. Finally, an UpAttention block refines the predicted results by merging the image and text features. This design provides a simple and accurate way to leverage multiscale progressive text prior prompts for medical image segmentation. Compared with using only images, our model achieves high-quality results with low data annotation costs. Moreover, our model not only has excellent reliability and validity on medical images but also performs well on natural images. The experimental results on different image datasets demonstrate that our model is effective and robust for image segmentation.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper primarily aims to address several key challenges in medical image segmentation, specifically including the following aspects: 1. **High Data Annotation Cost**: Training a deep neural network for medical image segmentation requires a large amount of annotated data to ensure high-precision results. To reduce the need for extensive annotated data, the authors propose a method that uses text prompts as prior knowledge to guide the segmentation process. 2. **Multimodal Information Fusion**: By combining image and text information, the quality of medical image segmentation can be improved. Specifically, text prompts are used to generate multimodal features, which are then fused with image features to enhance the segmentation effect. 3. **Semantic Gap Issue**: Addressing the semantic gap between natural data and medical data. To this end, a two-stage training process is proposed, where contrastive learning pre-training is first conducted on natural images, and then the pre-trained model is applied to the medical image segmentation task. 4. **Efficient Segmentation Model**: Designing an efficient model structure that can capture contextual semantic information while maintaining high precision under limited computational resources. By combining Convolutional Neural Networks (CNN) and Transformer modules, the model balances the ability to extract both local and global features. In summary, this paper aims to propose a new method for medical image segmentation by introducing text prompts and contrastive learning techniques, achieving high-quality segmentation results with lower data annotation costs.