SAM-SP: Self-Prompting Makes SAM Great Again

Chunpeng Zhou,Kangjie Ning,Qianqian Shen,Sheng Zhou,Zhi Yu,Haishuai Wang
2024-08-22
Abstract:The recently introduced Segment Anything Model (SAM), a Visual Foundation Model (VFM), has demonstrated impressive capabilities in zero-shot segmentation tasks across diverse natural image datasets. Despite its success, SAM encounters noticeably performance degradation when applied to specific domains, such as medical images. Current efforts to address this issue have involved fine-tuning strategies, intended to bolster the generalizability of the vanilla SAM. However, these approaches still predominantly necessitate the utilization of domain specific expert-level prompts during the evaluation phase, which severely constrains the model's practicality. To overcome this limitation, we introduce a novel self-prompting based fine-tuning approach, called SAM-SP, tailored for extending the vanilla SAM model. Specifically, SAM-SP leverages the output from the previous iteration of the model itself as prompts to guide subsequent iteration of the model. This self-prompting module endeavors to learn how to generate useful prompts autonomously and alleviates the dependence on expert prompts during the evaluation phase, significantly broadening SAM's applicability. Additionally, we integrate a self-distillation module to enhance the self-prompting process further. Extensive experiments across various domain specific datasets validate the effectiveness of the proposed SAM-SP. Our SAM-SP not only alleviates the reliance on expert prompts but also exhibits superior segmentation performance comparing to the state-of-the-art task-specific segmentation approaches, the vanilla SAM, and SAM-based approaches.
Computer Vision and Pattern Recognition,Artificial Intelligence,Emerging Technologies
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to reduce the dependence on expert - level prompts and improve the segmentation performance when applying the Segment Anything Model (SAM) in specific fields (such as medical images, remote - sensing images, etc.). Although SAM performs excellently in zero - shot segmentation tasks of natural images, its performance will decline significantly in specific - field applications due to the existence of the domain gap. Although the existing improvement methods improve the performance through fine - tuning with domain - specific data, they still require accurate expert prompts to guide the inference stage, which limits the practical application scope of SAM. For this reason, the paper proposes a self - prompting - based fine - tuning method - SAM - SP, aiming to enable SAM to generate useful prompts independently, so as to get rid of the dependence on expert prompts in the inference stage and further expand the application scope of SAM. Specifically, SAM - SP uses the model output of the previous iteration as a prompt to guide the model of the subsequent iteration, and learns how to automatically generate useful prompts in this way. In addition, a self - distillation module is introduced to further enhance the effect of the self - prompting process. The main contributions of the paper include: 1. Emphasizing the importance of reducing the dependence on expert prompts when deploying visual foundation models in specific fields. 2. Proposing the SAM - SP framework, which enables the model to perform inference without user prompts by introducing the self - prompting module and the self - distillation module. 3. Constructing and releasing a new segmentation dataset Seg - GPR for sub - layer disease detection, and the dataset is collected by 3D ground - penetrating radar. 4. Conducting extensive experiments on multiple public datasets, and verifying that SAM - SP has superior performance compared with the state - of - the - art task - specific segmentation methods, the original SAM, and SAM baseline methods without using any user prompts. Through these improvements, SAM - SP not only reduces the dependence on expert prompts, but also shows excellent segmentation performance in multiple downstream tasks.