Abstract:Massive and diverse remote sensing data provide opportunities for data-driven tasks in the real world, but also present challenges in terms of data processing and analysis, especially pixel-level image interpretation. However, the existing shallow-learning and deep-learning segmentation methods, bounded by their technical bottlenecks, cannot properly balance accuracy and efficiency, and are thus hardly scalable to the practice scenarios of remote sensing in a successful way. Instead of following the time-consuming deep stacks of local operations as most state-of-the-art segmentation networks, we propose a novel segmentation model with the encoder–decoder structure, dubbed XANet, which leverages the more computationally economical attention mechanism to boost performance. Two novel attention modules in XANet are proposed to strengthen the encoder and decoder, respectively, namely the Attention Recalibration Module (ARM) and Attention Fusion Module (AFM). Unlike current attention modules, which only focus on elevating the feature representation power, and regard the spatial and channel enhancement of a feature map as two independent steps, ARM gathers element-wise semantic descriptors coupling spatial and channel information to directly generate a 3D attention map for feature enhancement, and AFM innovatively utilizes the cross-attention mechanism for the sufficient spatial and channel fusion of multi-scale features. Extensive experiments were conducted on ISPRS and GID datasets to comprehensively analyze XANet and explore the effects of ARM and AFM. Furthermore, the results demonstrate that XANet surpasses other state-of-the-art segmentation methods in both model performance and efficiency, as ARM yields a superior improvement versus existing attention modules with a competitive computational overhead, and AFM achieves the complementary advantages of multi-level features under the sufficient consideration of efficiency.

Paying Attention for Adjacent Areas: Learning Discriminative Features for Large-Scale 3D Scene Segmentation

ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation.

Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation

Point Attention Network for Semantic Segmentation of 3D Point Clouds

Background-Aware 3D Point Cloud Segmentationwith Dynamic Point Feature Aggregation

Dilated Nearest-Neighbor Encoding for 3D Semantic Segmentation of Point Clouds

LEARD-Net: Semantic segmentation for large-scale point cloud scene

AANet: Adaptive Attention Networks for Semantic Segmentation of High-Resolution Remote Sensing Imagery

XANet: An Efficient Remote Sensing Image Segmentation Model Using Element-Wise Attention Enhancement and Multi-Scale Attention Fusion

Scene Segmentation With Dual Relation-Aware Attention Network

Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation

MEDANet: More Efficient Dual Attention Network for Scene Segmentation

FA-ResNet: Feature affine residual network for large-scale point cloud segmentation

Global Context Dependencies Aware Network for Efficient Semantic Segmentation of Fine-Resolution Remoted Sensing Images

DLA-Net: Learning Dual Local Attention Features for Semantic Segmentation of Large-Scale Building Facade Point Clouds

GA-NET: Global Attention Network for Point Cloud Semantic Segmentation

Semantic segmentation of large-scale point clouds based on dilated nearest neighbors graph

Point Attention Network for Point Cloud Semantic Segmentation.

Adaptive multi-scale dual attention network for semantic segmentation

DOCNet: Dual-Domain Optimized Class-Aware Network for Remote Sensing Image Segmentation

Progressive Scene Segmentation Based on Self-Attention Mechanism.