HR and LiDAR Data Collaborative Semantic Segmentation Based on Adaptive Cross-Modal Fusion Network
Zhen Ye,Zhen Li,Nan Wang,Yuan Li,Wei Li
DOI: https://doi.org/10.1109/jstars.2024.3418387
IF: 4.715
2024-01-01
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Abstract:Semantic segmentation using cross-modal data is a hot topic in the field of Earth observation. Compared with single-modal strategies, cross-modal networks fuse multiaspect information and yield higher segmentation accuracy, which is widely used in urban planning, environmental monitoring and so on. In this study, an end-to-end adaptive cross-modal fusion network (ACFNet) is proposed for semantic segmentation task using high resolution and light detection and ranging images, because of the difference of sensor resolution, different modal data have different abilities of ground object expression. Therefore, multimodal data fusion should consider the features with different spatial scales, while most existing methods simply use the same spatial scale features for fusion. In this work, we first design an adaptive scale fusion module that can automatically choose the features with optimal spatial scales, making full use of the representation properties of ground object details. Second, the important feature guidance module is designed, which can evaluate the influence weights of deep semantic features and shallow spatial detailed features, achieving adaptive deep and shallow feature fusion, and reducing the semantic-spatial information dilution caused by layer-by-layer up and down sampling. Finally, we introduce a divide Fourier context learning (DFCL) module to transform the feature maps from spatial domain to frequency domain. Compared to the limited perception of current spatial convolution kernels, the DFCL module can easily model the contextual dependencies of cross-modal features, which will improve the segmentaion accuracy for complex ground objects of cities, especially for occlusion. To demonstrate the generalisation performance of our module, we conduct extensive experiments and ablation studies on three datasets: Potsdam, Vaihingen, and IEEE GRSS DFC 2018. Results show that the proposed ACFNet is effective in semantic segmentation.