A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation

Wen-Liang Du,Yang Gu,Jiaqi Zhao,Hancheng Zhu,Rui Yao,Yong Zhou
DOI: https://doi.org/10.1109/lgrs.2024.3476269
IF: 5.343
2024-11-01
IEEE Geoscience and Remote Sensing Letters
Abstract:Recent advances in deep learning have made significant progress in multimodal remote sensing semantic segmentation. However, current methods face challenges in maintaining geometric consistency, particularly when dealing with large objects, resulting in fragmented segmentation masks. We propose a Mamba-diffusion framework to preserve geometric consistency in segmentation masks. This framework preserves geometric consistency by introducing a generative diffusion-based semantic segmentation pipeline and developing a Mamba-based multimodal fusion model. The fusion model fuses the multimodal images in multiple scales and scanning mechanisms by a double cross-fusion (DCF) module. Then, the cross-modal information is further integrated by a dual-splitting structured state-space (DS-S4) model. Finally, the diffusion-based segmentation pipeline predicts semantic masks by progressively refining random Gaussian noise, guided by fused multimodal features. Our experimental results, verified on WHU-OPT-SAR and Hunan datasets, demonstrate that the proposed framework surpasses state-of-the-art (SOTA) methods by a considerable margin. Our codes are available at https://github.com/WenliangDu/MambaDiffusion.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?