Abstract:Massive and diverse remote sensing data provide opportunities for data-driven tasks in the real world, but also present challenges in terms of data processing and analysis, especially pixel-level image interpretation. However, the existing shallow-learning and deep-learning segmentation methods, bounded by their technical bottlenecks, cannot properly balance accuracy and efficiency, and are thus hardly scalable to the practice scenarios of remote sensing in a successful way. Instead of following the time-consuming deep stacks of local operations as most state-of-the-art segmentation networks, we propose a novel segmentation model with the encoder–decoder structure, dubbed XANet, which leverages the more computationally economical attention mechanism to boost performance. Two novel attention modules in XANet are proposed to strengthen the encoder and decoder, respectively, namely the Attention Recalibration Module (ARM) and Attention Fusion Module (AFM). Unlike current attention modules, which only focus on elevating the feature representation power, and regard the spatial and channel enhancement of a feature map as two independent steps, ARM gathers element-wise semantic descriptors coupling spatial and channel information to directly generate a 3D attention map for feature enhancement, and AFM innovatively utilizes the cross-attention mechanism for the sufficient spatial and channel fusion of multi-scale features. Extensive experiments were conducted on ISPRS and GID datasets to comprehensively analyze XANet and explore the effects of ARM and AFM. Furthermore, the results demonstrate that XANet surpasses other state-of-the-art segmentation methods in both model performance and efficiency, as ARM yields a superior improvement versus existing attention modules with a competitive computational overhead, and AFM achieves the complementary advantages of multi-level features under the sufficient consideration of efficiency.

D-CANet: Diverse Class-Aware Coding and Decoding Structure Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Context Aggregation Network for Remote Sensing Image Semantic Segmentation

LoG-CAN: local-global Class-aware Network for semantic segmentation of remote sensing images

LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery

Optimizing Spatial Relationships in GCN to Improve the Classification Accuracy of Remote Sensing Images

Hybridizing Cross-Level Contextual and Attentive Representations for Remote Sensing Imagery Semantic Segmentation

AANet: Adaptive Attention Networks for Semantic Segmentation of High-Resolution Remote Sensing Imagery

DCANet: Dense Context-Aware Network for Semantic Segmentation

Scale-Aware Neural Network for Semantic Segmentation of Multi-Resolution Remote Sensing Images

Densely Based Multi-Scale and Multi-Modal Fully Convolutional Networks for High-Resolution Remote-Sensing Image Semantic Segmentation

Remote Sensing Semantic Segmentation via Boundary Supervision-Aided Multiscale Channelwise Cross Attention Network

XANet: An Efficient Remote Sensing Image Segmentation Model Using Element-Wise Attention Enhancement and Multi-Scale Attention Fusion

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

THCANet: Two-layer hop cascaded asymptotic network for robot-driving road-scene semantic segmentation in RGB-D images

DSNet:Multi-resolution Dense Encoder and Stack Decoder Network for Aerial Image Segmentation

Object-Enhanced Semantic Segmentation Model for High-Resolution Remote Sensing Images

Hierarchical Self-Attention Embedded Neural Network With Dense Connection for Remote-Sensing Image Semantic Segmentation

DESENet: a bilateral network with detail-enhanced semantic encoder for real-time semantic segmentation

SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation

Global Context Dependencies Aware Network for Efficient Semantic Segmentation of Fine-Resolution Remoted Sensing Images

Unbalanced Class Learning Network With Scale-Adaptive Perception for Complicated Scene in Remote Sensing Images Segmentation