Abstract:Instance segmentation in remote sensing (RS) imagery aims to predict the locations of instances and represent them with pixel-level masks. Thanks to the more accurate pixel-level information for each instance, instance segmentation has enormous potential applications in resource planning, urban surveillance, and military reconnaissance. However, current RS imagery instance segmentation methods mostly follow the fully supervised paradigm, relying on expensive pixel-level labels. Moreover, remote sensing imagery suffers from cluttered backgrounds and significant variations in target scales, making segmentation challenging. To accommodate these limitations, we propose a semantic attention enhancement and structured model-guided multi-scale weakly supervised instance segmentation network (SASM-Net). Building upon the modeling of spatial relationships for weakly supervised instance segmentation, we further design the multi-scale feature extraction module (MSFE module), semantic attention enhancement module (SAE module), and structured model guidance module (SMG module) for SASM-Net to enable a balance between label production costs and visual processing. The MSFE module adopts a hierarchical approach similar to the residual structure to establish equivalent feature scales and to adapt to the significant scale variations of instances in RS imagery. The SAE module is a dual-stream structure with semantic information prediction and attention enhancement streams. It can enhance the network's activation of instances in the images and reduce cluttered backgrounds' interference. The SMG module can assist the SAE module in the training process to construct supervision with edge information, which can implicitly lead the model to a representation with structured inductive bias, reducing the impact of the low sensitivity of the model to edge information caused by the lack of fine-grained pixel-level labeling. Experimental results indicate that the proposed SASM-Net is adaptable to optical and synthetic aperture radar (SAR) RS imagery instance segmentation tasks. It accurately predicts instance masks without relying on pixel-level labels, surpassing the segmentation accuracy of all weakly supervised methods. It also shows competitiveness when compared to hybrid and fully supervised paradigms. This research provides a low-cost, high-quality solution for the instance segmentation task in optical and SAR RS imagery.

Index Your Position: A Novel Self-Supervised Learning Method for Remote Sensing Images Semantic Segmentation

ScoreSeg: Leveraging Score-Based Generative Model for Self-Supervised Semantic Segmentation of Remote Sensing

Self-Supervised Learning on Small In-Domain Datasets Can Overcome Supervised Learning in Remote Sensing

Simple and Efficient: A Semisupervised Learning Framework for Remote Sensing Image Semantic Segmentation

Remote Sensing Image Scene Classification with Self-Supervised Paradigm under Limited Labeled Samples

A generic self-supervised learning (SSL) framework for representation learning from spectra-spatial feature of unlabeled remote sensing imagery

Semi-Supervised Semantic Segmentation of Remote Sensing Images With Iterative Contrastive Network

RSI-Net: Two-Stream Deep Neural Network for Remote Sensing Images-Based Semantic Segmentation

CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding

RiSSNet: Contrastive Learning Network with a Relaxed Identity Sampling Strategy for Remote Sensing Image Semantic Segmentation

SiamSeg: Self-Training with Contrastive Learning for Unsupervised Domain Adaptation Semantic Segmentation in Remote Sensing

In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene Classification

Self-Supervised Learning in Remote Sensing: A review

SegMind: Semisupervised Remote Sensing Image Semantic Segmentation With Masked Image Modeling and Contrastive Learning Method

Global–Local Semantic Interaction Network for Salient Object Detection in Optical Remote Sensing Images With Scribble Supervision

Learning What and Where to Learn: A New Perspective on Self-supervised Learning

Semantic Attention and Structured Model for Weakly Supervised Instance Segmentation in Optical and SAR Remote Sensing Imagery

UGCNet: An Unsupervised Semantic Segmentation Network Embedded With Geometry Consistency for Remote-Sensing Images

Learning Self-supervised Low-Rank Network for Single-Stage Weakly and Semi-supervised Semantic Segmentation

SemiRS-COC: Semi-Supervised Classification for Complex Remote Sensing Scenes With Cross-Object Consistency

Enhancing Self-Supervised Learning for Remote Sensing with Elevation Data: A Case Study with Scarce And High Level Semantic Labels