Abstract:Massive and diverse remote sensing data provide opportunities for data-driven tasks in the real world, but also present challenges in terms of data processing and analysis, especially pixel-level image interpretation. However, the existing shallow-learning and deep-learning segmentation methods, bounded by their technical bottlenecks, cannot properly balance accuracy and efficiency, and are thus hardly scalable to the practice scenarios of remote sensing in a successful way. Instead of following the time-consuming deep stacks of local operations as most state-of-the-art segmentation networks, we propose a novel segmentation model with the encoder–decoder structure, dubbed XANet, which leverages the more computationally economical attention mechanism to boost performance. Two novel attention modules in XANet are proposed to strengthen the encoder and decoder, respectively, namely the Attention Recalibration Module (ARM) and Attention Fusion Module (AFM). Unlike current attention modules, which only focus on elevating the feature representation power, and regard the spatial and channel enhancement of a feature map as two independent steps, ARM gathers element-wise semantic descriptors coupling spatial and channel information to directly generate a 3D attention map for feature enhancement, and AFM innovatively utilizes the cross-attention mechanism for the sufficient spatial and channel fusion of multi-scale features. Extensive experiments were conducted on ISPRS and GID datasets to comprehensively analyze XANet and explore the effects of ARM and AFM. Furthermore, the results demonstrate that XANet surpasses other state-of-the-art segmentation methods in both model performance and efficiency, as ARM yields a superior improvement versus existing attention modules with a competitive computational overhead, and AFM achieves the complementary advantages of multi-level features under the sufficient consideration of efficiency.

CCENet: Cascade Class-Aware Enhanced Network for High-Resolution Aerial Imagery Semantic Segmentation.

Class Semantic Enhancement Network for Semantic Segmentation

AANet: Adaptive Attention Networks for Semantic Segmentation of High-Resolution Remote Sensing Imagery

CCANet: Cross-Modality Comprehensive Feature Aggregation Network for Indoor Scene Semantic Segmentation

Aerial-BiSeNet: A real-time semantic segmentation network for high resolution aerial imagery

Edge-Enhanced GCIFFNet: A Multiclass Semantic Segmentation Network Based on Edge Enhancement and Multiscale Attention Mechanism

Remote Sensing Semantic Segmentation via Boundary Supervision-Aided Multiscale Channelwise Cross Attention Network

D-CANet: Diverse Class-Aware Coding and Decoding Structure Network for Semantic Segmentation of High-Resolution Remote Sensing Images

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

XANet: An Efficient Remote Sensing Image Segmentation Model Using Element-Wise Attention Enhancement and Multi-Scale Attention Fusion

Hybridizing Cross-Level Contextual and Attentive Representations for Remote Sensing Imagery Semantic Segmentation

Efficient Multi-scale Network for Semantic Segmentation of fine-Resolution Remotely Sensed Images

HCNet: Hierarchical Context Network for Semantic Segmentation

DOCNet: Dual-Domain Optimized Class-Aware Network for Remote Sensing Image Segmentation

ABCNet: Attentive bilateral contextual network for efficient semantic segmentation of Fine-Resolution remotely sensed imagery

SACANet: scene-aware class attention network for semantic segmentation of remote sensing images

Context Aggregation Network for Remote Sensing Image Semantic Segmentation

SCAttNet: Semantic Segmentation Network with Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images

CCNet: Criss-Cross Attention for Semantic Segmentation

Category-Based Interactive Attention and Perception Fusion Network for Semantic Segmentation of Remote Sensing Images