Multimodal Image Fusion Based on Diffusion Model

Bo Yang,Zhaohui Jiang,Dong Pan,Haoyang Yu,Weihua Gui
DOI: https://doi.org/10.1145/3677454.3677467
2024-01-01
Abstract:Existing fusion algorithms often struggle to maintain modality features, leading to issues like color distortion and halo artifacts in fusion results. To address these challenges, this paper proposes a multimodal image fusion method based on diffusion models. Specifically, the proposed approach formulates the multimodal fusion task as a conditional generation problem within a DDPM sampling framework. It utilizes multi-scale modality features to guide the iterative generation of fusion images. Our proposed method consists of two key modules: a feature extraction module and a diffusion generation module. The former employs a Swin Transformer network to extract complementary modality features from multimodal images, while the latter implements a denoising network with forward and reverse diffusion processes in pixel space to facilitate data distribution learning. To maintain the original spectral characteristics of objects, enhance modal information integration, and preserve color fidelity, gradient consistency loss and intensity consistency loss are introduced. Extensive experiments validate that our method outperforms some state-of-the-art fusion techniques, especially in terms of color fidelity.
What problem does this paper attempt to address?