A Dual Cross Attention Transformer Network for Infrared and Visible Image Fusion

Zhuozhi Zhou,Jinhui Lan
DOI: https://doi.org/10.1109/icaibd62003.2024.10604480
2024-01-01
Abstract:Infrared and visible image fusion task aims to decompose and combine complementary information from both sensors. To overcome the lack of global intensity balance in fusion images, we proposed our joint transformer network with feature enhancement and stack cross attention (SCA) layer. Firstly, axis-based self-attention layers are applied to extract shallow features. Then, feature enhancement layer extracts feature from spatial and channel perspectives into stacks. Subsequently, the SCA layer employs cross attention for feature interaction between modalities and cross-layer attention for reassembling feature stacks to targeted pattern, which adaptively generates cross modality and feature layer relationships, respectively. Moreover, to tackle the issue of maintaining fusion by results-oriented metrics, we conduct decomposition loss to constrain above procedure by controlling cross modality correlation. Therefore, modality-specific and modality-general features are divided properly, facilitating feature reconstruction in the decoder. Finally, qualitative results show that our method preserves abundant texture and precise intensity from source images. Quantitative experimental results demonstrate that our fusion network achieves the state-of-the-art fusion performance, especially in mutual information (MI).
What problem does this paper attempt to address?