Enhancing Multimodal Fusion with Only Unimodal Data

Wenqi Han,Jie Geng,Xinyang Deng,Wen Jiang
DOI: https://doi.org/10.1109/igarss53475.2024.10641451
2024-01-01
Abstract:With recent advances in remote sensing technology, a wealth of multimodal data is available for applications. However, considering the domain differences between multimodal data and the alignment challenges in practical applications, it becomes important and challenging to integrate these data effectively. In this paper, we propose a multimodal prototype representation fusion network (MPRFN) for SAR and optical image fusion segmentation. Specifically, a more robust multimodal feature representation is provided by constructing multimodal category prototype representations that better capture the characteristics and distribution of each data. Meanwhile, a prototype-consistent semi-supervised learning method is proposed to improve the effectiveness of multimodal fusion semantic segmentation using a large number of unlabelled unimodal SAR images. Experiments on SAR and optical multimodal datasets show that the proposed method achieves state-of-the-art performance.
What problem does this paper attempt to address?