Interactive residual coordinate attention and contrastive learning for infrared and visible image fusion in triple frequency bands

Zhihua Xie,Sha Zong,Qiang Li,Peiqi Cai,Yaxiong Zhan,Guodong Liu
DOI: https://doi.org/10.1038/s41598-023-51045-9
IF: 4.6
2024-01-02
Scientific Reports
Abstract:Abstract The auto-encoder (AE) based image fusion models have achieved encouraging performance on infrared and visible image fusion. However, the meaningful information loss in the encoding stage and simple unlearnable fusion strategy are two significant challenges for such models. To address these issues, this paper proposes an infrared and visible image fusion model based on interactive residual attention fusion strategy and contrastive learning in the frequency domain. Firstly, the source image is transformed into three sub-bands of the high-frequency, low-frequency, and mid-frequency for powerful multiscale representation from the prospective of the frequency spectrum analysis. To further cope with the limitations of the straightforward fusion strategy, a learnable coordinate attention module in the fusion layer is incorporated to adaptively fuse representative information based on the characteristics of the corresponding feature maps. Moreover, the contrastive learning is leveraged to train the multiscale decomposition network for enhancing the complementarity of information at different frequency spectra. Finally, the detail-preserving loss, feature enhancing loss and contrastive loss are incorporated to jointly train the entire fusion model for good detail maintainability. Qualitative and quantitative comparisons demonstrate the feasibility and validity of our model, which can consistently generate fusion images containing both highlight targets and legible details, outperforming the state-of-the-art fusion methods.
multidisciplinary sciences
What problem does this paper attempt to address?