Abstract:Although the current different types of SAM adaptation methods have achieved promising performance for various downstream tasks, such as prompt-based ones and adapter-based ones, most of them belong to the one-step adaptation paradigm. In real-world scenarios, we are generally confronted with the dynamic scenario where the data comes in a streaming manner. Driven by the practical need, in this paper, we first propose a novel Continual SAM adaptation (CoSAM) benchmark with 8 different task domains and carefully analyze the limitations of the existing SAM one-step adaptation methods in the continual segmentation scenario. Then we propose a novel simple-yet-effective Mixture of Domain Adapters (MoDA) algorithm which utilizes the Global Feature Tokens (GFT) and Global Assistant Tokens (GAT) modules to help the SAM encoder extract well-separated features for different task domains, and then provide the accurate task-specific information for continual learning. Extensive experiments demonstrate that our proposed MoDA obviously surpasses the existing classic continual learning methods, as well as prompt-based and adapter-based approaches for continual segmentation. Moreover, after sequential learning on the CoSAM benchmark with diverse data distributions, our MoDA maintains highly competitive results in the natural image domain, approaching the zero-shot performance of the original SAM, demonstrating its superior capability in knowledge preservation. Notably, the proposed MoDA can be seamlessly integrated into various one-step adaptation methods of SAM, which can consistently bring obvious performance gains. Code is available at \url{<a class="link-external link-https" href="https://github.com/yangjl1215/CoSAM" rel="external noopener nofollow">this https URL</a>}
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in real - world scenarios, how can image segmentation models (such as Segment Anything Model, SAM) continuously adapt to new tasks while maintaining good performance on previous tasks in the face of continuous data streams. Specifically, most of the existing SAM adaptation methods belong to the one - time adaptation paradigm and are difficult to handle dynamically changing task sequences. Therefore, the researchers proposed a brand - new benchmark - Continual SAM Adaptation Benchmark (CoSAM), aiming to evaluate and improve the performance of SAM in continuous learning scenarios.
### Core problems of the paper
1. **Limitations of existing methods**:
- Current SAM adaptation methods mainly focus on one - time adaptation, that is, fine - tuning for specific tasks.
- In practical applications, data usually arrives in a streaming manner, and the model needs to be continuously updated to adapt to new tasks without forgetting the knowledge of old tasks.
2. **Proposed new benchmark**:
- The CoSAM benchmark was constructed, which contains tasks in 8 different fields, covering industrial defect detection, medical imaging, camouflaged object detection, etc.
- Through this benchmark, the performance of existing SAM adaptation algorithms in continuous learning scenarios can be systematically evaluated.
3. **Proposed new method**:
- The Mixture of Domain Adapters (MoDA) algorithm was proposed, which uses Global Feature Tokens (GFT) and Global Assistant Tokens (GAT) modules to help the SAM encoder extract features of different task domains.
- MoDA can effectively alleviate catastrophic forgetting and provide accurate task - specific information, thereby improving the segmentation performance in continuous learning scenarios.
### Markdown representation of formulas
- Definitions of Mean Intersection over Union (IoU) and Boundary Intersection over Union (BIoU):
\[
\text{IoU}_t=\frac{1}{t}\sum_{k = 1}^{t}\text{IoU}_{k,t}, \quad \text{BIoU}_t=\frac{1}{t}\sum_{k = 1}^{t}\text{BIoU}_{k,t}
\]
- Final Last - IoU and Last - BIoU:
\[
\text{Last - IoU}=\text{IoU}_N, \quad \text{Last - BIoU}=\text{BIoU}_N
\]
- Average Forgetting Measure (FF - IoU):
\[
f_{k,t}=\max_{j\in\{1,\dots,t - 1\}}\text{IoU}_{k,j}-\text{IoU}_{k,t}, \quad \text{FF - IoU}=\frac{1}{N - 1}\sum_{k = 1}^{N - 1}f_{k,N}
\]
Through these formulas, the paper evaluates in detail the performance of different methods in continuous learning scenarios, especially the degree of catastrophic forgetting.
### Summary
By constructing the CoSAM benchmark and proposing the MoDA algorithm, this paper solves the limitations of existing SAM adaptation methods in continuous learning scenarios and significantly improves the model's adaptability and knowledge retention ability when facing continuous data streams.