Abstract:Instance segmentation in remote sensing (RS) imagery aims to predict the locations of instances and represent them with pixel-level masks. Thanks to the more accurate pixel-level information for each instance, instance segmentation has enormous potential applications in resource planning, urban surveillance, and military reconnaissance. However, current RS imagery instance segmentation methods mostly follow the fully supervised paradigm, relying on expensive pixel-level labels. Moreover, remote sensing imagery suffers from cluttered backgrounds and significant variations in target scales, making segmentation challenging. To accommodate these limitations, we propose a semantic attention enhancement and structured model-guided multi-scale weakly supervised instance segmentation network (SASM-Net). Building upon the modeling of spatial relationships for weakly supervised instance segmentation, we further design the multi-scale feature extraction module (MSFE module), semantic attention enhancement module (SAE module), and structured model guidance module (SMG module) for SASM-Net to enable a balance between label production costs and visual processing. The MSFE module adopts a hierarchical approach similar to the residual structure to establish equivalent feature scales and to adapt to the significant scale variations of instances in RS imagery. The SAE module is a dual-stream structure with semantic information prediction and attention enhancement streams. It can enhance the network's activation of instances in the images and reduce cluttered backgrounds' interference. The SMG module can assist the SAE module in the training process to construct supervision with edge information, which can implicitly lead the model to a representation with structured inductive bias, reducing the impact of the low sensitivity of the model to edge information caused by the lack of fine-grained pixel-level labeling. Experimental results indicate that the proposed SASM-Net is adaptable to optical and synthetic aperture radar (SAR) RS imagery instance segmentation tasks. It accurately predicts instance masks without relying on pixel-level labels, surpassing the segmentation accuracy of all weakly supervised methods. It also shows competitiveness when compared to hybrid and fully supervised paradigms. This research provides a low-cost, high-quality solution for the instance segmentation task in optical and SAR RS imagery.

Semantic Segmentation of High-Resolution Remote Sensing Images Based on Sparse Self-Attention and Feature Alignment

Semantic Segmentation With Attention Mechanism for Remote Sensing Images

Hierarchical Self-Attention Embedded Neural Network With Dense Connection for Remote-Sensing Image Semantic Segmentation

SCAttNet: Semantic Segmentation Network with Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images

Hybrid Multiple Attention Network for Semantic Segmentation in Aerial Images

Semantic Segmentation of Remote Sensing Image Based on Regional Self-Attention Mechanism

Lightweight Attention Network for Very High-Resolution Image Semantic Segmentation

AANet: Adaptive Attention Networks for Semantic Segmentation of High-Resolution Remote Sensing Imagery

Semantic segmentation of remote sensing images combined with attention mechanism and feature enhancement U-Net

Hybridizing Cross-Level Contextual and Attentive Representations for Remote Sensing Imagery Semantic Segmentation

Multi-scale attention fusion network for semantic segmentation of remote sensing images

AANet: an Attention-Based Alignment Semantic Segmentation Network for High Spatial Resolution Remote Sensing Images

High-Resolution Remote Sensing Image Semantic Segmentation via Multiscale Context and Linear Self-Attention

MASANet: Multi-Angle Self-Attention Network for Semantic Segmentation of Remote Sensing Images

Semantic Segmentation for Multisource Remote Sensing Images Incorporating Feature Slice Reconstruction and Attention Upsampling

SPANet: Successive Pooling Attention Network for Semantic Segmentation of Remote Sensing Images

APNet: Attention Mechanism with Point Sampling Loss Network for Remote Sensing Images Semantic Segmentation

Semantic Labeling of High-Resolution Images Combining a Self-Cascaded Multimodal Fully Convolution Neural Network with Fully Conditional Random Field

A Synergistical Attention Model for Semantic Segmentation of Remote Sensing Images

Semantic Attention and Structured Model for Weakly Supervised Instance Segmentation in Optical and SAR Remote Sensing Imagery

Threshold Attention Network for Semantic Segmentation of Remote Sensing Images