SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation

Guoan Wang,Jin Ye,Junlong Cheng,Tianbin Li,Zhaolin Chen,Jianfei Cai,Junjun He,Bohan Zhuang
2024-07-06
Abstract:Volumetric medical image segmentation is pivotal in enhancing disease diagnosis, treatment planning, and advancing medical research. While existing volumetric foundation models for medical image segmentation, such as SAM-Med3D and SegVol, have shown remarkable performance on general organs and tumors, their ability to segment certain categories in clinical downstream tasks remains limited. Supervised Finetuning (SFT) serves as an effective way to adapt such foundation models for task-specific downstream tasks but at the cost of degrading the general knowledge previously stored in the original foundation <a class="link-external link-http" href="http://model.To" rel="external noopener nofollow">this http URL</a> address this, we propose SAM-Med3D-MoE, a novel framework that seamlessly integrates task-specific finetuned models with the foundational model, creating a unified model at minimal additional training expense for an extra gating network. This gating network, in conjunction with a selection strategy, allows the unified model to achieve comparable performance of the original models in their respective tasks both general and specialized without updating any parameters of them.Our comprehensive experiments demonstrate the efficacy of SAM-Med3D-MoE, with an average Dice performance increase from 53 to 56.4 on 15 specific classes. It especially gets remarkable gains of 29.6, 8.5, 11.2 on the spinal cord, esophagus, and right hip, respectively. Additionally, it achieves 48.9 Dice on the challenging SPPIN2023 Challenge, significantly surpassing the general expert's performance of 32.3. We anticipate that SAM-Med3D-MoE can serve as a new framework for adapting the foundation model to specific areas in medical image analysis. Codes and datasets will be publicly available.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in 3D medical image segmentation, although the existing base models perform excellently in the segmentation of general organs and tumors, their performance in specific clinical downstream tasks is still limited. Supervised Finetuning (SFT) is an effective adaptation method, but it will lead to the degradation of the original general knowledge of the base model. For this reason, the paper proposes a new framework - SAM - Med3D - MoE. Through the Mixture of Experts (MoE) technology, it seamlessly integrates the task - specific finetuning model with the base model to create a unified model, achieving high - performance performance for general and specific tasks with minimal additional training cost without updating any original parameters. Specifically, the paper introduces a lightweight gating network to process image and prompt embeddings, generates confidence scores for each task - specific expert model, and combines a novel selection strategy to adaptively combine the outputs of the general expert and the Top - 1 specific expert to generate the final segmentation mask. This method not only improves the performance on new tasks but also performs excellently in maintaining the performance of the original tasks, effectively alleviating the problem of "catastrophic forgetting". The experimental results show that the average Dice performance of SAM - Med3D - MoE on 15 specific categories has increased from 53.2% to 56.4%, especially achieving significant improvements of 29.6%, 8.5% and 11.2% in the segmentation of the spinal cord, esophagus and right hip bone respectively. In addition, in the challenging SPPIN2023 competition, this model reached a Dice score of 48.9%, significantly exceeding the performance of 32.3% of ordinary experts. These results verify the effectiveness of SAM - Med3D - MoE and show its potential application value in the field of 3D medical image analysis.