Beyond Adapting SAM: Towards End-to-End Ultrasound Image Segmentation via Auto Prompting

Xian Lin,Yangyang Xiang,Li Yu,Zengqiang Yan
2024-07-08
Abstract:End-to-end medical image segmentation is of great value for computer-aided diagnosis dominated by task-specific models, usually suffering from poor generalization. With recent breakthroughs brought by the segment anything model (SAM) for universal image segmentation, extensive efforts have been made to adapt SAM for medical imaging but still encounter two major issues: 1) severe performance degradation and limited generalization without proper adaptation, and 2) semi-automatic segmentation relying on accurate manual prompts for interaction. In this work, we propose SAMUS as a universal model tailored for ultrasound image segmentation and further enable it to work in an end-to-end manner denoted as AutoSAMUS. Specifically, in SAMUS, a parallel CNN branch is introduced to supplement local information through cross-branch attention, and a feature adapter and a position adapter are jointly used to adapt SAM from natural to ultrasound domains while reducing training complexity. AutoSAMUS is realized by introducing an auto prompt generator (APG) to replace the manual prompt encoder of SAMUS to automatically generate prompt embeddings. A comprehensive ultrasound dataset, comprising about 30k images and 69k masks and covering six object categories, is collected for verification. Extensive comparison experiments demonstrate the superiority of SAMUS and AutoSAMUS against the state-of-the-art task-specific and SAM-based foundation models. We believe the auto-prompted SAM-based model has the potential to become a new paradigm for end-to-end medical image segmentation and deserves more exploration. Code and data are available at <a class="link-external link-https" href="https://github.com/xianlin7/SAMUS" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in medical image segmentation, existing models usually have two main problems: 1. **Performance degradation and limited generalization ability**: When directly applying general - purpose image segmentation models (such as Segment Anything Model, SAM) to medical images, due to the special characteristics of medical images (such as complex shapes, blurred boundaries, small - sized targets or low contrast), it will lead to serious performance degradation and limited generalization ability. 2. **Semi - automatic segmentation relying on manual prompts**: Existing SAM - based medical image segmentation methods require humans to provide task - related prompts (such as points, bounding boxes, etc.), which results in a semi - automatic segmentation process that is inflexible and difficult to be applied on a large scale in the clinical environment. To solve these problems, the author proposes a general - purpose model named SAMUS and further extends it to AutoSAMUS to achieve end - to - end ultrasound image segmentation. Specifically: - **SAMUS**: By introducing parallel CNN branches to supplement local information, as well as feature adapters and position adapters to adapt to the transformation from natural images to ultrasound images, the performance and generalization ability of the model are improved. - **AutoSAMUS**: By introducing an Automatic Prompt Generator (APG) to replace the manual prompt encoder, a fully automated end - to - end segmentation process is achieved. These improvements make the model more efficient and flexible when dealing with medical images and can better handle diverse tasks in the clinic.