MSANet: Multiscale Self-Attention Aggregation Network for Few-Shot Aerial Imagery Segmentation
Jianzhao Li,Maoguo Gong,Weihao Li,Mingyang Zhang,Yourun Zhang,Shanfeng Wang,Yue Wu
DOI: https://doi.org/10.1109/tgrs.2023.3339666
IF: 8.2
2023-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Few-shot aerial imagery segmentation refers to the task of segmenting specific objects in scenes that have not been encountered during training with a small amount of annotated data for reference. However, most existing few-shot segmentation algorithms are primarily designed for natural images, and there is still a lack of exploration in the context of remote sensing aerial imagery. In this article, we propose a novel multiscale self-attention aggregation network (MS2A2Net), dubbed MS2A2Net, to address the challenge of few-shot aerial image segmentation in terms of scarce data and network architecture. Specifically, we first incorporate the designed asymmetric momentum contrastive learning (AMCL) into the pre-training stage, to improve the representation capability of the backbone without the expensive labeled data. Then the frozen encoder is transferred to the downstream few-shot segmentation task as the feature embedding. In terms of network architecture, we design self-attention aggregation in multiscale feature fusion, to construct the dual correlation of foreground and background between support and query features at the pixel level. Besides, the coordinate attention is designed to rearrange the distribution of feature importance in both horizontal and vertical spatial order perspectives, which facilitates adaptive fusion with the multiscale features. To verify the availability of the proposed MS2A2Net, we also reconstructed two novel datasets dedicated to few-shot aerial image segmentation, called DLRSD- $4^{i}$ and iSAID- $4^{i}$ . The experimental results show that our approach MS2A2Net is superior in three few-shot benchmark aerial imagery segmentation datasets, which achieves competitive segmentation performance. Extensive ablation experiments also reflect the effectiveness and scalability of the proposed components and overall network architecture.