MAS-SAM: Segment Any Marine Animal with Aggregated Features

Tianyu Yan,Zifu Wan,Xinhao Deng,Pingping Zhang,Yang Liu,Huchuan Lu
2024-05-09
Abstract:Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of the SAM's decoder might lead to the loss of fine-grained object details. To address the above issues, we propose a novel feature learning framework named MAS-SAM for marine animal segmentation, which involves integrating effective adapters into the SAM's encoder and constructing a pyramidal decoder. More specifically, we first build a new SAM's encoder with effective adapters for underwater scenes. Then, we introduce a Hypermap Extraction Module (HEM) to generate multi-scale features for a comprehensive guidance. Finally, we propose a Progressive Prediction Decoder (PPD) to aggregate the multi-scale features and predict the final segmentation results. When grafting with the Fusion Attention Module (FAM), our method enables to extract richer marine information from global contextual cues to fine-grained local details. Extensive experiments on four public MAS datasets demonstrate that our MAS-SAM can obtain better results than other typical segmentation methods. The source code is available at
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the accurate segmentation of marine animals in complex underwater environments. Specifically, although the existing Segment Anything Model (SAM) performs excellently in natural - light scenes, its performance drops significantly in underwater scenes due to problems such as light scattering and absorption leading to decreased image quality, reduced contrast, and object blurring. In addition, the decoder structure of SAM is relatively simple, which may lead to the loss of fine - grained object details. These problems make SAM face challenges when dealing with the marine - animal - segmentation task. To address these challenges, the paper proposes a new feature - learning framework named MAS - SAM, which is specifically optimized for the marine - animal - segmentation task. MAS - SAM improves SAM in the following ways: 1. **Adapter - informed SAM Encoder (ASE)**: By introducing effective adapters, the encoder of SAM is improved so that it can extract unique features from marine - animal images. 2. **Hypermap Extraction Module (HEM)**: Generate multi - scale feature maps to provide comprehensive guidance for the subsequent mask - prediction process. 3. **Progressive Prediction Decoder (PPD)**: By gradually aggregating multi - source features from the original prompts, ASE, and HEM, the representational ability of the decoder is improved, capturing a wide range of information from the global context to fine - grained local details. These improvements enable MAS - SAM to achieve better results on four publicly available marine - animal - segmentation datasets than other typical segmentation methods.