GCtx-UNet: Efficient Network for Medical Image Segmentation

Khaled Alrfou,Tian Zhao
2024-06-10
Abstract:Medical image segmentation is crucial for disease diagnosis and monitoring. Though effective, the current segmentation networks such as UNet struggle with capturing long-range features. More accurate models such as TransUNet, Swin-UNet, and CS-UNet have higher computation complexity. To address this problem, we propose GCtx-UNet, a lightweight segmentation architecture that can capture global and local image features with accuracy better or comparable to the state-of-the-art approaches. GCtx-UNet uses vision transformer that leverages global context self-attention modules joined with local self-attention to model long and short range spatial dependencies. GCtx-UNet is evaluated on the Synapse multi-organ abdominal CT dataset, the ACDC cardiac MRI dataset, and several polyp segmentation datasets. In terms of Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD) metrics, GCtx-UNet outperformed CNN-based and Transformer-based approaches, with notable gains in the segmentation of complex and small anatomical structures. Moreover, GCtx-UNet is much more efficient than the state-of-the-art approaches with smaller model size, lower computation workload, and faster training and inference speed, making it a practical choice for clinical applications.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address a critical issue in medical image segmentation, specifically the limitations of existing segmentation networks (such as UNet) in capturing long-range features. Although current efficient models (like TransUNet, Swin-UNet, and CS-UNet) have improved accuracy, their high computational complexity limits practical applications. To tackle this challenge, the authors propose GCtx-UNet, a lightweight segmentation architecture capable of capturing global and local image features with better or comparable accuracy. GCtx-UNet leverages a vision transformer, combining a global context self-attention module with a local self-attention mechanism to model long-range and short-range spatial dependencies. Experimental results show that on the Synapse multi-organ abdominal CT dataset, ACDC cardiac MRI dataset, and several polyp segmentation datasets, GCtx-UNet outperforms CNN-based and transformer-based methods in metrics such as Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD). Additionally, GCtx-UNet demonstrates significant advantages in segmenting complex and small anatomical structures. Moreover, GCtx-UNet has a smaller model size, lower computational workload, and faster training and inference speeds, making it an ideal choice for clinical applications.