DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion

Xudong Lu,Yuqi Jiang,Haiwen Hong,Qi Sun,Cheng Zhuo
DOI: https://doi.org/10.1145/3664647.3681478
2024-01-01
Abstract:Multi-modality image fusion (MMIF) aims to integrate the complementary features of source images into the fused image, including target saliency and texture specifics. Recently, image fusion methods leveraging diffusion models have demonstrated commendable results. Despite their strengths, diffusion models reduce the capability to perceive local features. Additionally, their inherent working mechanism, introducing noise to the inputs, consequently leads to a loss of original information. To overcome this problem, we propose a novel Diffusion-CNN feature Aggregation Fusion (DCAFuse) network that can extract complementary features from the dual branches and aggregate them effectively. Specifically, we utilize the denoising diffusion probabilistic model (DDPM) in the diffusion-based branch to construct global information, and multi-scale convolutional kernels in the CNN-based branch to extract local detailed features. Afterward, we design a novel complementary feature aggregation module (CFAM). By constructing coordinate attention maps for features, CFAM captures long-range dependencies in both horizontal and vertical directions, thereby dynamically guiding the aggregation weights of branches. In addition, to further improve the complementarity of dual-branch features, we introduce a novel loss function based on cosine similarity and a unique denoising timestep selection strategy. Extensive experimental results show that our proposed DCAFuse outperforms other state-of-the-art methods in multiple image fusion tasks, including infrared and visible image fusion (IVF) and medical image fusion (MIF).
What problem does this paper attempt to address?