A Foreground-driven Fusion Network for Gully Erosion Extraction Utilizing UAV Orthoimages and Digital Surface Models

Yi Shen,Nan Su,Chunhui Zhao,Yiming Yan,Shou Feng,Yong Liu,Wei Xiang
DOI: https://doi.org/10.1109/tgrs.2024.3417398
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Unmanned aerial vehicle (UAV) orthoimages and digital surface models (DSMs) can provide valuable insights for semantic segmentation methods in comprehending gully erosion (GE) from diverse perspectives. While the integration of these two modalities has the potential to improve the GE extraction performance, the extent of enhancement primarily depends on the quality of modality-specific features and the synergistic fusion manner employed for integrating features from both modalities. Toward this end, we propose a novel multimodal segmentation method, which is called foreground-driven fusion network (FFNet). Guided by the prototypes of foreground objects (i.e., gullies), the network effectively tackles the challenges from the modality itself and between different modalities, ultimately achieving high-quality GE extraction results. Specifically, a foreground prototype sampling (FPS) module is first devised for precisely sampling foreground prototypes related to gullies from two modalities. Then, a local-global hybrid purification (LHP) module is proposed to effectively mitigate the erroneous activation within each modality at multiple dimensions by leveraging foreground prototypes. Finally, a multimodal foreground synergy (MFS) module is introduced to further activate foreground features and facilitate full complementarity between multimodal foreground features. To validate our network, a comprehensive multimodal dataset for GE extraction is constructed based on UAV orthoimages and DSMs from northeastern China. Furthermore, a public road extraction dataset is employed to evaluate the generalizability of this network. In the experiments conducted on these two datasets, the proposed FFNet exhibits obvious superiority, outperforming the second-best method with an average improvement of 2.55% in terms of intersection over union (IoU) and 2.77% in terms of $F1$ -score. These experimental results not only demonstrate the practicality of FFNet in GE extraction tasks, but also highlight its significant advantage in similar road extraction tasks.
What problem does this paper attempt to address?