TTT-Unet: Enhancing U-Net with Test-Time Training Layers for Biomedical Image Segmentation

Rong Zhou,Zhengqing Yuan,Zhiling Yan,Weixiang Sun,Kai Zhang,Yiwei Li,Yanfang Ye,Xiang Li,Lifang He,Lichao Sun
2024-09-19
Abstract:Biomedical image segmentation is crucial for accurately diagnosing and analyzing various diseases. However, Convolutional Neural Networks (CNNs) and Transformers, the most commonly used architectures for this task, struggle to effectively capture long-range dependencies due to the inherent locality of CNNs and the computational complexity of Transformers. To address this limitation, we introduce TTT-Unet, a novel framework that integrates Test-Time Training (TTT) layers into the traditional U-Net architecture for biomedical image segmentation. TTT-Unet dynamically adjusts model parameters during the testing time, enhancing the model's ability to capture both local and long-range features. We evaluate TTT-Unet on multiple medical imaging datasets, including 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images. The results demonstrate that TTT-Unet consistently outperforms state-of-the-art CNN-based and Transformer-based segmentation models across all tasks. The code is available at <a class="link-external link-https" href="https://github.com/rongzhou7/TTT-Unet" rel="external noopener nofollow">this https URL</a>.
Image and Video Processing,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in biomedical image segmentation tasks, existing Convolutional Neural Networks (CNNs) and Transformer architectures are difficult to effectively capture long - distance dependencies. Specifically: 1. **Limitations of CNNs**: Although CNNs perform well in multi - scale feature capture, their inherent locality limits their ability to handle long - distance dependencies. This limitation is particularly evident especially when there are large shape and size variations among patients. 2. **Limitations of Transformers**: Although Transformers can naturally understand the global context, their computational complexity is high, especially in dense biomedical image segmentation tasks. In addition, the fixed - size hidden state limits their ability to handle complex and subtle dependencies. To overcome these limitations, the paper introduces a new framework - TTT - Unet. By integrating Test - Time Training (TTT) layers in the traditional U - Net architecture, the model parameters are dynamically adjusted, thereby enhancing the model's ability to capture local and long - distance features. ### Main contributions 1. **Introduction of TTT - Unet**: This is an enhanced U - Net architecture that integrates TTT layers, allowing the model to perform self - supervised adaptation at test time, thereby more effectively capturing long - distance dependencies. 2. **Performance evaluation**: Extensive experiments were carried out on multiple medical imaging datasets, including 3D abdominal organ segmentation (CT and MRI images), instrument segmentation in endoscopic images, and cell segmentation in microscopic images. The results show that TTT - Unet outperforms the existing state - of - the - art CNN and Transformer baseline models in all tasks. ### Experimental results - **2D segmentation tasks**: - **Abdominal MRI organ segmentation**: The DSC of TTT - Unet_Bot is 0.7750 ± 0.1022, and the NSD is 0.8452 ± 0.1080; the DSC of TTT - Unet_Enc is 0.7725 ± 0.1044, and the NSD is 0.8540 ± 0.1032. - **Endoscopic image instrument segmentation**: The DSC of TTT - Unet_Bot is 0.6643 ± 0.3018, and the NSD is 0.6799 ± 0.3056; the DSC of TTT - Unet_Enc is 0.6696 ± 0.3018, and the NSD is 0.6820 ± 0.3080. - **Microscopic image cell segmentation**: The F1 score of TTT - Unet_Bot is 0.5818 ± 0.2410; the F1 score of TTT - Unet_Enc is 0.5773 ± 0.2435. - **3D segmentation tasks**: - **Abdominal CT organ segmentation**: The DSC of TTT - Unet_Bot is 0.8709 ± 0.1011, and the NSD is 0.8995 ± 0.0721. - **Abdominal MRI organ segmentation**: The DSC of TTT - Unet_Bot is 0.8677 ± 0.0482, and the NSD is 0.9247 ± 0.0631. ### Conclusion TTT - Unet significantly improves the performance in biomedical image segmentation tasks by dynamically adjusting the model parameters at test time, especially in scenarios dealing with long - distance dependencies and high anatomical variability. This makes TTT - Unet a flexible and highly adaptable solution with broad application prospects. However, the computational cost of test - time training remains a problem that needs further optimization.