Abstract:Biomedical image segmentation is crucial for accurately diagnosing and analyzing various diseases. However, Convolutional Neural Networks (CNNs) and Transformers, the most commonly used architectures for this task, struggle to effectively capture long-range dependencies due to the inherent locality of CNNs and the computational complexity of Transformers. To address this limitation, we introduce TTT-Unet, a novel framework that integrates Test-Time Training (TTT) layers into the traditional U-Net architecture for biomedical image segmentation. TTT-Unet dynamically adjusts model parameters during the testing time, enhancing the model's ability to capture both local and long-range features. We evaluate TTT-Unet on multiple medical imaging datasets, including 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images. The results demonstrate that TTT-Unet consistently outperforms state-of-the-art CNN-based and Transformer-based segmentation models across all tasks. The code is available at <a class="link-external link-https" href="https://github.com/rongzhou7/TTT-Unet" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in biomedical image segmentation tasks, existing Convolutional Neural Networks (CNNs) and Transformer architectures are difficult to effectively capture long - distance dependencies. Specifically: 1. **Limitations of CNNs**: Although CNNs perform well in multi - scale feature capture, their inherent locality limits their ability to handle long - distance dependencies. This limitation is particularly evident especially when there are large shape and size variations among patients. 2. **Limitations of Transformers**: Although Transformers can naturally understand the global context, their computational complexity is high, especially in dense biomedical image segmentation tasks. In addition, the fixed - size hidden state limits their ability to handle complex and subtle dependencies. To overcome these limitations, the paper introduces a new framework - TTT - Unet. By integrating Test - Time Training (TTT) layers in the traditional U - Net architecture, the model parameters are dynamically adjusted, thereby enhancing the model's ability to capture local and long - distance features. ### Main contributions 1. **Introduction of TTT - Unet**: This is an enhanced U - Net architecture that integrates TTT layers, allowing the model to perform self - supervised adaptation at test time, thereby more effectively capturing long - distance dependencies. 2. **Performance evaluation**: Extensive experiments were carried out on multiple medical imaging datasets, including 3D abdominal organ segmentation (CT and MRI images), instrument segmentation in endoscopic images, and cell segmentation in microscopic images. The results show that TTT - Unet outperforms the existing state - of - the - art CNN and Transformer baseline models in all tasks. ### Experimental results - **2D segmentation tasks**: - **Abdominal MRI organ segmentation**: The DSC of TTT - Unet_Bot is 0.7750 ± 0.1022, and the NSD is 0.8452 ± 0.1080; the DSC of TTT - Unet_Enc is 0.7725 ± 0.1044, and the NSD is 0.8540 ± 0.1032. - **Endoscopic image instrument segmentation**: The DSC of TTT - Unet_Bot is 0.6643 ± 0.3018, and the NSD is 0.6799 ± 0.3056; the DSC of TTT - Unet_Enc is 0.6696 ± 0.3018, and the NSD is 0.6820 ± 0.3080. - **Microscopic image cell segmentation**: The F1 score of TTT - Unet_Bot is 0.5818 ± 0.2410; the F1 score of TTT - Unet_Enc is 0.5773 ± 0.2435. - **3D segmentation tasks**: - **Abdominal CT organ segmentation**: The DSC of TTT - Unet_Bot is 0.8709 ± 0.1011, and the NSD is 0.8995 ± 0.0721. - **Abdominal MRI organ segmentation**: The DSC of TTT - Unet_Bot is 0.8677 ± 0.0482, and the NSD is 0.9247 ± 0.0631. ### Conclusion TTT - Unet significantly improves the performance in biomedical image segmentation tasks by dynamically adjusting the model parameters at test time, especially in scenarios dealing with long - distance dependencies and high anatomical variability. This makes TTT - Unet a flexible and highly adaptable solution with broad application prospects. However, the computational cost of test - time training remains a problem that needs further optimization.

TTT-Unet: Enhancing U-Net with Test-Time Training Layers for Biomedical Image Segmentation

Mixed Transformer U-Net for Medical Image Segmentation

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

ETUNet:Exploring efficient transformer enhanced UNet for 3D brain tumor segmentation

TT-Net: Tensorized Transformer Network for 3D medical image segmentation

HCT-Unet: multi-target medical image segmentation via a hybrid CNN-transformer Unet incorporating multi-axis gated multi-layer perceptron

TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation.

Med-TTT: Vision Test-Time Training model for Medical Image Segmentation

UNETR: Transformers for 3D Medical Image Segmentation

UNet based on dynamic convolution decomposition and triplet attention

LATrans-Unet: Improving CNN-Transformer with Location Adaptive for Medical Image Segmentation.

CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation

DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation

TransCUNet: UNet cross fused transformer for medical image segmentation

CAT-Unet: An Enhanced U-Net Architecture with Coordinate Attention and Skip-Neighborhood Attention Transformer for Medical Image Segmentation

UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation

TEC-Net: Vision Transformer Embrace Convolutional Neural Networks for Medical Image Segmentation

HTC-Net: A hybrid CNN-transformer framework for medical image segmentation

nn-TransUNet: An Automatic Deep Learning Pipeline for Heart MRI Segmentation