Not Just Learning from Others but Relying on Yourself: A New Perspective on Few-Shot Segmentation in Remote Sensing

Hanbo Bi,Yingchao Feng,Zhiyuan Yan,Yongqiang Mao,Wenhui Diao,Hongqi Wang,Xian Sun
2023-10-19
Abstract:Few-shot segmentation (FSS) is proposed to segment unknown class targets with just a few annotated samples. Most current FSS methods follow the paradigm of mining the semantics from the support images to guide the query image segmentation. However, such a pattern of `learning from others' struggles to handle the extreme intra-class variation, preventing FSS from being directly generalized to remote sensing scenes. To bridge the gap of intra-class variance, we develop a Dual-Mining network named DMNet for cross-image mining and self-mining, meaning that it no longer focuses solely on support images but pays more attention to the query image itself. Specifically, we propose a Class-public Region Mining (CPRM) module to effectively suppress irrelevant feature pollution by capturing the common semantics between the support-query image pair. The Class-specific Region Mining (CSRM) module is then proposed to continuously mine the class-specific semantics of the query image itself in a `filtering' and `purifying' manner. In addition, to prevent the co-existence of multiple classes in remote sensing scenes from exacerbating the collapse of FSS generalization, we also propose a new Known-class Meta Suppressor (KMS) module to suppress the activation of known-class objects in the sample. Extensive experiments on the iSAID and LoveDA remote sensing datasets have demonstrated that our method sets the state-of-the-art with a minimum number of model parameters. Significantly, our model with the backbone of Resnet-50 achieves the mIoU of 49.58% and 51.34% on iSAID under 1-shot and 5-shot settings, outperforming the state-of-the-art method by 1.8% and 1.12%, respectively. The code is publicly available at <a class="link-external link-https" href="https://github.com/HanboBizl/DMNet" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to solve two main problems of few - shot segmentation (FSS) in remote sensing scenarios: 1. **Great intra - class differences**: In remote sensing images, even for targets of the same category (such as "roundabout" and "airplane"), their forms may have significant differences. Such extreme intra - class differences make it difficult for existing FSS methods to extract effective features from support images to guide the segmentation of query images, thus affecting the generalization ability of the model. 2. **Co - existence of multiple categories**: In remote sensing images, multiple categories often appear simultaneously, which further exacerbates the generalization difficulties of FSS methods. Specifically, if the model over - fits the known categories during the training process, when dealing with images containing unknown categories, it may wrongly activate the regions of known categories, resulting in inaccurate segmentation results. To solve the above problems, the authors propose a dual - mining network named DMNet, which improves the performance of FSS in remote sensing scenarios through cross - image mining and self - mining. DMNet contains the following three key modules: 1. **Class - public Region Mining (CPRM) module**: - This module captures the common semantics between support images and support - query image pairs, suppresses the interference of irrelevant features, and reduces the impact of intra - class differences. - In terms of specific implementation, the CPRM module uses Position - based Class - public Region Mining (PCRM) and Channel - based Class - public Region Mining (CCRM) to model the semantic associations of target categories in the position and channel dimensions. 2. **Class - specific Region Mining (CSRM) module**: - This module focuses on mining the category semantics of the query image itself to guide its own segmentation. - By filtering and purifying the initial prediction results, the CSRM module can extract the potential category semantics in the query image and use these semantics to continuously activate other target regions. 3. **Known - class Meta Suppressor (KMS) module**: - This module aims to alleviate the generalization collapse problem caused by over - fitting known categories. - During the training stage, the KMS module continuously captures the representative semantics of known categories through an additional branch and stores them in the meta - memory. During the test stage, the KMS module uses the prototypes of known categories and the prototypes of target categories to jointly suppress the activation of known categories in the query image. Through these innovative designs, DMNet can achieve better performance when dealing with complex remote sensing scenarios. The experimental results show that DMNet has achieved state - of - the - art performance on both the iSAID and LoveDA remote sensing datasets. In particular, in the 1 - shot and 5 - shot settings, it has reached mIoU of 49.58% and 51.34% respectively, outperforming existing methods.