Abstract:Semantic segmentation using cross-modal data is a hot topic in the field of Earth observation. Compared with single-modal strategies, cross-modal networks fuse multiaspect information and yield higher segmentation accuracy, which is widely used in urban planning, environmental monitoring and so on. In this study, an end-to-end adaptive cross-modal fusion network (ACFNet) is proposed for semantic segmentation task using high resolution and light detection and ranging images, because of the difference of sensor resolution, different modal data have different abilities of ground object expression. Therefore, multimodal data fusion should consider the features with different spatial scales, while most existing methods simply use the same spatial scale features for fusion. In this work, we first design an adaptive scale fusion module that can automatically choose the features with optimal spatial scales, making full use of the representation properties of ground object details. Second, the important feature guidance module is designed, which can evaluate the influence weights of deep semantic features and shallow spatial detailed features, achieving adaptive deep and shallow feature fusion, and reducing the semantic-spatial information dilution caused by layer-by-layer up and down sampling. Finally, we introduce a divide Fourier context learning (DFCL) module to transform the feature maps from spatial domain to frequency domain. Compared to the limited perception of current spatial convolution kernels, the DFCL module can easily model the contextual dependencies of cross-modal features, which will improve the segmentaion accuracy for complex ground objects of cities, especially for occlusion. To demonstrate the generalisation performance of our module, we conduct extensive experiments and ablation studies on three datasets: Potsdam, Vaihingen, and IEEE GRSS DFC 2018. Results show that the proposed ACFNet is effective in semantic segmentation.

AFNet: Adaptive Fusion Network for Remote Sensing Image Semantic Segmentation

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

Multi-Scale Adaptive Feature Fusion Network For Semantic Segmentation In Remote Sensing Images

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

AFANet: A Multibackbone Compatible Feature Fusion Framework for Effective Remote Sensing Object Detection

Remote Sensing Image Semantic Segmentation Network Based on Multi-Scale Feature Enhancement Fusion

MFALNet: A Multiscale Feature Aggregation Lightweight Network for Semantic Segmentation of High-Resolution Remote Sensing Images.

MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images

Adaptive Multiscale Deep Fusion Residual Network for Remote Sensing Image Classification

Scale-Aware Neural Network for Semantic Segmentation of Multi-Resolution Remote Sensing Images

MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation

AF: Adaptive Focus Framework for Aerial Imagery Segmentation

ASFNet: Adaptive Multiscale Segmentation Fusion Network for Real‐time Semantic Segmentation

HR and LiDAR Data Collaborative Semantic Segmentation Based on Adaptive Cross-Modal Fusion Network

AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation

BAFNet: Bilateral Attention Fusion Network for Lightweight Semantic Segmentation of Urban Remote Sensing Images

MFAFNet: A Lightweight and Efficient Network with Multi-Level Feature Adaptive Fusion for Real-Time Semantic Segmentation

MFCA-Net: a deep learning method for semantic segmentation of remote sensing images

MFCSNet: Multi-Scale Deep Features Fusion and Cost-Sensitive Loss Function Based Segmentation Network for Remote Sensing Images

(AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network

Semantic Segmentation of Remote-Sensing Images Based on Multiscale Feature Fusion and Attention Refinement