Abstract:Few-shot aerial imagery segmentation refers to the task of segmenting specific objects in scenes that have not been encountered during training with a small amount of annotated data for reference. However, most existing few-shot segmentation algorithms are primarily designed for natural images, and there is still a lack of exploration in the context of remote sensing aerial imagery. In this article, we propose a novel multiscale self-attention aggregation network (MS2A2Net), dubbed MS2A2Net, to address the challenge of few-shot aerial image segmentation in terms of scarce data and network architecture. Specifically, we first incorporate the designed asymmetric momentum contrastive learning (AMCL) into the pre-training stage, to improve the representation capability of the backbone without the expensive labeled data. Then the frozen encoder is transferred to the downstream few-shot segmentation task as the feature embedding. In terms of network architecture, we design self-attention aggregation in multiscale feature fusion, to construct the dual correlation of foreground and background between support and query features at the pixel level. Besides, the coordinate attention is designed to rearrange the distribution of feature importance in both horizontal and vertical spatial order perspectives, which facilitates adaptive fusion with the multiscale features. To verify the availability of the proposed MS2A2Net, we also reconstructed two novel datasets dedicated to few-shot aerial image segmentation, called DLRSD- $4^{i}$ and iSAID- $4^{i}$ . The experimental results show that our approach MS2A2Net is superior in three few-shot benchmark aerial imagery segmentation datasets, which achieves competitive segmentation performance. Extensive ablation experiments also reflect the effectiveness and scalability of the proposed components and overall network architecture.

Multi-level Spatial Attention Network for Image Data Segmentation.

MSAANet: Multi-scale Axial Attention Network for Medical Image Segmentation.

Sparse Spatial Attention Network for Semantic Segmentation

Multi-scale Spatial Aggregation Network for Remote Sensing Image Segmentation

MSANet: an Improved Semantic Segmentation Method Using Multi-Scale Attention for Remote Sensing Images

MASANet: Multi-Angle Self-Attention Network for Semantic Segmentation of Remote Sensing Images

AANet: Adaptive Attention Networks for Semantic Segmentation of High-Resolution Remote Sensing Imagery

DSANet: Dilated Spatial Attention for Real-Time Semantic Segmentation in Urban Street Scenes.

SACANet: scene-aware class attention network for semantic segmentation of remote sensing images

Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images

Adaptive multi-scale dual attention network for semantic segmentation

Multiscale Location Attention Network for Building and Water Segmentation of Remote Sensing Image.

MSANet: Multiscale Self-Attention Aggregation Network for Few-Shot Aerial Imagery Segmentation

Multi-Scale Attention Network for Building Extraction from High-Resolution Remote Sensing Images

AMNet: Convolutional Neural Network embeded with Attention Mechanism for Semantic Segmentation

Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation.

Lightweight Attention Network for Very High-Resolution Image Semantic Segmentation

Semantic Segmentation Network with Multi-Path Structure, Attention Reweighting and Multi-Scale Encoding

Multiattention Network for Semantic Segmentation of Fine-Resolution Remote Sensing Images

Multi-Attention-Based Semantic Segmentation Network for Land Cover Remote Sensing Images

Scale-aware Attention Network for Weakly Supervised Semantic Segmentation