Multimodal Segformer for Flood Rapid Mapping with Sentinel-2 Data

Xiaoqiang Lu,Tong Gou,Zhongjian Huang,Yuting Yang,Licheng Jiao,Lingling Li,Xu Liu,Fang Liu
DOI: https://doi.org/10.1109/igarss53475.2024.10642039
2024-01-01
Abstract:Flood rapid mapping products play an important role in informing flood emergency response and management. To this end, the 2024 IEEE GRSS Data Fusion Contest Track 2 (DFC24-T2) establishes a multimodal benchmark for the segmentation of flood areas from Sentinel-2 multispectral images. However, the problems of imbalanced data distribution, data scarsity, and inter-modal differences severely inhibit the performance of deep-learning-based segmentation networks. In this work, we propose an end-to-end Multimodal Transformer-based Segmentation Network (MTSN) for accurate flood rapid mapping. MTSN first employs two Siamese encoders with shared parameters to accept multimodal inputs and output their respective hierarchical multiscale features, which are then enriched by several channel attention blocks. Subsequently, a Cross-modal Feature Fusion Module (CFFM) based on a gated mechanism is proposed to efficiently integrate the benefits of multimodal features, and generate informative representations. Finally, the fused features are decoded by a lightweight pure multilayer perception decoder to quickly generate mapping results of flood areas. Moreover, we introduce offline data augmentation, semi-supervised learning, test-time augmentation, and multimodal post-process to further boost the performance and generalization of our MTSN. Experimental results and extensive ablations show the effectiveness of our method. Code is available at https://github.com/xiaoqiang-lu/MMSegFormer.
What problem does this paper attempt to address?