Abstract:Semantic segmentation of high-resolution remote sensing images (HRSIs) is a challenging task because objects in HRSIs usually have great scale variance and appearance variance. Although deep convolutional neural networks (DCNNs) have been widely applied in the semantic segmentation of HRSIs, they have inherent limitations in capturing global context. Attention mechanisms and transformer can effectively model long-range dependencies, but they often result in high computational costs when being applied to process HRSIs. In this article, an encoder-decoder network (MSGCNet) is proposed to fully and efficiently model multiscale context and long-range dependencies of HRSIs. Specifically, the multiscale interaction (MSI) module employs an efficient cross-attention to facilitate interaction among multiscale features of the encoder, which bridges the semantic gap between high- and low-level features and introduces more scale information to the network. In order to efficiently model long-range dependencies in both spatial and channel dimensions, the transformer-based decoder block (TBDB) implements window-based efficient multihead self-attention (W-EMSA) and enables interactions cross windows. Furthermore, to further integrate the global context generated by TBDB, the scale-aware fusion (SAF) module is proposed to deeply supervise the decoder, which iteratively fuses hierarchical features through spatial attention. As demonstrated by both quantitative and qualitative experimental results on two publicly available datasets, the proposed MSGCNet exhibits superior performance compared to currently popular methods. The code will be available at http://github.com/JingxiangZhou/MSGCNet.

Aggregating Multi-Scale Contextual Features from Multiple Stages for Semantic Image Segmentation

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Multi-Scale Context Intertwining For Semantic Segmentation

Multi-scale deep context convolutional neural networks for semantic segmentation

Multi-stage Context Refinement Network for Semantic Segmentation

Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation Operation for Semantic Segmentation

Cross Aggregation Network for Semantic Segmentation

Multiscale Cascaded Network for the Semantic Segmentation of High-Resolution Remote Sensing Images

Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images.

Cross-Scale Feature Propagation Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Unified Multi-scale Feature Abstraction for Medical Image Segmentation

Multiscale Global Context Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Multi-level graph convolutional recurrent neural network for semantic image segmentation

Muti-Scale Context-Aware Network for Cross-Domain Unsupervised Remote Sensing Image Semantic Segmentation

Remote Sensing Image Semantic Segmentation Network Based on Multi-Scale Feature Enhancement Fusion

MSCFNet: A Lightweight Network with Multi-Scale Context Fusion for Real-Time Semantic Segmentation

Multi‐stream Densely Connected Network for Semantic Segmentation

CAN: Contextual Aggregating Network for Semantic Segmentation.

Multi-Level and Multi-Scale Feature Aggregation Network for Semantic Segmentation in Vehicle-Mounted Scenes

Multi-Scale Context Interaction Learning Network for Medical Image Segmentation