TP-UNet: Temporal Prompt Guided UNet for Medical Image Segmentation

Ranmin Wang,Limin Zhuang,Hongkun Chen,Boyan Xu,Ruichu Cai
2024-11-20
Abstract:The advancement of medical image segmentation techniques has been propelled by the adoption of deep learning techniques, particularly UNet-based approaches, which exploit semantic information to improve the accuracy of segmentations. However, the order of organs in scanned images has been disregarded by current medical image segmentation approaches based on UNet. Furthermore, the inherent network structure of UNet does not provide direct capabilities for integrating temporal information. To efficiently integrate temporal information, we propose TP-UNet that utilizes temporal prompts, encompassing organ-construction relationships, to guide the segmentation UNet model. Specifically, our framework is featured with cross-attention and semantic alignment based on unsupervised contrastive learning to combine temporal prompts and image features effectively. Extensive evaluations on two medical image segmentation datasets demonstrate the state-of-the-art performance of TP-UNet. Our implementation will be open-sourced after acceptance.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the existing UNet - based medical image segmentation methods fail to fully utilize the temporal information in the scanned images. Specifically, current methods overlook the order of organs in the scanned images, and the inherent network structure of UNet lacks the ability to directly integrate temporal information. This restricts the model's ability to perform more accurate segmentation on dynamically changing organs. To address this problem, the authors propose TP - UNet (Temporal Prompt Guided UNet), a new framework that uses temporal prompts to guide the UNet model for medical image segmentation. TP - UNet effectively integrates temporal information and image features by introducing temporal prompts and combining cross - attention mechanisms and unsupervised contrastive learning for semantic alignment, thereby improving the accuracy of segmentation. The following are the key problems and solutions proposed in the paper: 1. **Problem Description**: - Existing UNet - based methods ignore the temporal information in the scanned images. - The network structure of UNet itself cannot directly process temporal information. 2. **Solution**: - Propose the TP - UNet framework, which uses temporal prompts to guide the model to learn the temporal information in medical images. - Design a two - stage process: semantic alignment and modality fusion, to narrow the semantic gap between text and image features and effectively aggregate these features. 3. **Specific Implementation**: - **Temporal Prompt Module**: Generate temporal prompts based on the appearance probabilities of organs at different time points to help the model understand the temporal characteristics of the images. - **Multimodal Encoder**: Use two text encoders, CLIP and Electra, to encode the temporal prompts and combine them with the image features extracted by UNet. - **Semantic Alignment Module**: Align the semantic spaces of text and image features through unsupervised contrastive learning. - **Modality Fusion Module**: Use the cross - attention mechanism to fuse the temporal prompts and image features to form a unified representation and input it into the UNet decoder. 4. **Experimental Results**: - Extensive experiments were carried out on two datasets, UW - Madison and LITS 2017, to verify the effectiveness of TP - UNet, achieving new SOTA performance. Through the above methods, TP - UNet can better utilize temporal information and improve the accuracy of medical image segmentation, especially when dealing with dynamically changing organs. In summary, this paper aims to improve the existing UNet - based medical image segmentation methods by introducing temporal information, thereby enhancing the accuracy and robustness of segmentation.