Abstract:The advancement of medical image segmentation techniques has been propelled by the adoption of deep learning techniques, particularly UNet-based approaches, which exploit semantic information to improve the accuracy of segmentations. However, the order of organs in scanned images has been disregarded by current medical image segmentation approaches based on UNet. Furthermore, the inherent network structure of UNet does not provide direct capabilities for integrating temporal information. To efficiently integrate temporal information, we propose TP-UNet that utilizes temporal prompts, encompassing organ-construction relationships, to guide the segmentation UNet model. Specifically, our framework is featured with cross-attention and semantic alignment based on unsupervised contrastive learning to combine temporal prompts and image features effectively. Extensive evaluations on two medical image segmentation datasets demonstrate the state-of-the-art performance of TP-UNet. Our implementation will be open-sourced after acceptance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that the existing UNet - based medical image segmentation methods fail to fully utilize the temporal information in the scanned images. Specifically, current methods overlook the order of organs in the scanned images, and the inherent network structure of UNet lacks the ability to directly integrate temporal information. This restricts the model's ability to perform more accurate segmentation on dynamically changing organs. To address this problem, the authors propose TP - UNet (Temporal Prompt Guided UNet), a new framework that uses temporal prompts to guide the UNet model for medical image segmentation. TP - UNet effectively integrates temporal information and image features by introducing temporal prompts and combining cross - attention mechanisms and unsupervised contrastive learning for semantic alignment, thereby improving the accuracy of segmentation. The following are the key problems and solutions proposed in the paper: 1. **Problem Description**: - Existing UNet - based methods ignore the temporal information in the scanned images. - The network structure of UNet itself cannot directly process temporal information. 2. **Solution**: - Propose the TP - UNet framework, which uses temporal prompts to guide the model to learn the temporal information in medical images. - Design a two - stage process: semantic alignment and modality fusion, to narrow the semantic gap between text and image features and effectively aggregate these features. 3. **Specific Implementation**: - **Temporal Prompt Module**: Generate temporal prompts based on the appearance probabilities of organs at different time points to help the model understand the temporal characteristics of the images. - **Multimodal Encoder**: Use two text encoders, CLIP and Electra, to encode the temporal prompts and combine them with the image features extracted by UNet. - **Semantic Alignment Module**: Align the semantic spaces of text and image features through unsupervised contrastive learning. - **Modality Fusion Module**: Use the cross - attention mechanism to fuse the temporal prompts and image features to form a unified representation and input it into the UNet decoder. 4. **Experimental Results**: - Extensive experiments were carried out on two datasets, UW - Madison and LITS 2017, to verify the effectiveness of TP - UNet, achieving new SOTA performance. Through the above methods, TP - UNet can better utilize temporal information and improve the accuracy of medical image segmentation, especially when dealing with dynamically changing organs. In summary, this paper aims to improve the existing UNet - based medical image segmentation methods by introducing temporal information, thereby enhancing the accuracy and robustness of segmentation.

TP-UNet: Temporal Prompt Guided UNet for Medical Image Segmentation

Mixed Transformer U-Net for Medical Image Segmentation

UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation

UNet based on dynamic convolution decomposition and triplet attention

TTT-Unet: Enhancing U-Net with Test-Time Training Layers for Biomedical Image Segmentation

3D U$$^$$-Net: A 3D Universal U-Net for Multi-domain Medical Image Segmentation

MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet

UNet#: A UNet-like Redesigning Skip Connections for Medical Image Segmentation

Performance Analysis of UNet and Variants for Medical Image Segmentation

N-Net: an UNet architecture with dual encoder for medical image segmentation

A Medical Image Segmentation Method Based on Improved UNet 3+ Network

TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers

DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical Image Segmentation

RT‐Unet: An advanced network based on residual network and transformer for medical image segmentation

CAT-Unet: An Enhanced U-Net Architecture with Coordinate Attention and Skip-Neighborhood Attention Transformer for Medical Image Segmentation

P-TransUNet: an improved parallel network for medical image segmentation

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

UNet++: A Nested U-Net Architecture for Medical Image Segmentation