Text-prompt Camouflaged Instance Segmentation with Graduated Camouflage Learning

Zhentao He,Changqun Xia,Shengye Qiao,Jia Li
DOI: https://doi.org/10.1145/3664647.3681132
2024-01-01
Abstract:Camouflaged instance segmentation (CIS) aims to detect and segment objects blending with their surroundings. While existing CIS methods rely heavily on fully-supervised training with massive precisely annotated data, consuming considerable annotation efforts yet struggling to segment highly camouflaged objects accurately. Despite their visual similarity to the background, camouflaged objects differ semantically. Since text associated with images offers explicit semantic cues to underscore this difference, we propose a novel approach: the first Text-Prompt based weakly-supervised camouflaged instance segmentation method named TPNet, leveraging semantic distinctions for effective segmentation. TPNet operates in two stages: pseudo mask generation and a self-training process. In the first stage, we align text prompts with images using a language-image model to obtain region proposals containing camouflaged instances. A Semantic-Spatial Iterative Fusion module is designed to assimilate spatial information with semantic insights, iteratively refining pseudo mask. In the second stage, Graduated Camouflage Learning, a self-training strategy, sequences training from simple to complex images based on camouflage levels, facilitating an effective learning gradient. Through the collaboration of the dual phases, our method offers a comprehensive experiment on two common benchmark and demonstrates a significant advancement, delivering a novel solution that bridges the gap between weak-supervised and high camouflaged instance segmentation.
What problem does this paper attempt to address?