Swin-UMamba†: Adapting Mamba-based Vision Foundation Models for Medical Image Segmentation

Jiarun Liu,Hao Yang,Hong-Yu Zhou,Lequan Yu,Yong Liang,Yizhou Yu,Shaoting Zhang,Hairong Zheng,Shanshan Wang
DOI: https://doi.org/10.1109/tmi.2024.3508698
IF: 10.6
2024-01-01
IEEE Transactions on Medical Imaging
Abstract:Vision foundation models have shown great potential in improving generalizability and data efficiency, especially for medical image segmentation since medical image datasets are relatively small due to high annotation costs and privacy concerns. However, current research on foundation models predominantly relies on transformers. The high quadratic complexity and large parameter counts make these models computationally expensive, limiting their potential for clinical applications. In this work, we introduce Swin-UMamba†, a novel Mamba-based model for medical image segmentation that seamlessly leverages the power of the vision foundation model, which is also computationally efficient with the linear complexity of Mamba. Moreover, we investigated and verified the impact of the vision foundation model on medical image segmentation, in which a self-supervised model adaptation scheme was designed to bridge the gap between natural and medical data. Notably, Swin-UMamba† outperforms 7 state-of-the-art methods, including CNN-based, transformer-based, and Mamba-based approaches across AbdomenMRI, Encoscopy, and Microscopy datasets. The code and models are publicly available at: https://github.com/JiarunLiu/Swin-UMamba.
What problem does this paper attempt to address?