ACMFNet: Attention-Based Cross-Modal Fusion Network for Building Extraction of Remote Sensing Images
Baiyu Chen,Zongxu Pan,Jianwei Yang,Hui Long
DOI: https://doi.org/10.1109/tgrs.2024.3400979
IF: 8.2
2024-05-25
IEEE Transactions on Geoscience and Remote Sensing
Abstract:In recent years, significant progress has been made in extracting buildings from high spatial resolution (HSR) remote sensing images due to the rapid development of deep learning (DL). However, the existing methods still have some limitations in maintaining the detail integrity of building footprint. First, skip connections typically involve the direct concatenation of feature maps from adjacent levels, which inevitably leads to misalignment due to semantic differences. Second, the integration of building-related details remains a challenging task in the context of cross-modal remote sensing image. Third, the oversimplified upsampling structure used in previous methods may lead to loss of spatial details. In this article, we propose a novel building extraction method attention-based cross-modal fusion network (ACMFNet) based on cross-modal HSR remote sensing images using an encoder–decoder structure. First, we propose a global and local feature refinement module (GL-FRM) to refine features and establish contextual dependencies at multiple scales and levels, mitigating the spatial discrepancy among multilevel features. Meanwhile, a cross-modal fusion module is utilized to integrate complementary features extracted from multispectral (MS) data and normalized digital surface model (nDSM) data. In addition, we employed a lightweight residual upsampling module (RUM) for feature resolution recovery. We conducted complete experiments on two benchmark datasets, and the results indicate that our proposed ACMFNet achieves state-of-the-art (SOTA) performance without bells and whistles.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics