HitFusion: Infrared and Visible Image Fusion for High-Level Vision Tasks Using Transformer

Jun Chen,Jianfeng Ding,Jiayi Ma
DOI: https://doi.org/10.1109/tmm.2024.3405714
IF: 7.3
2024-10-19
IEEE Transactions on Multimedia
Abstract:This study proposes an innovative network to fuse infrared and visible images, called HitFusion, which uses the cross-feature transformer module and is compatible with high-level vision tasks. Firstly, existing image fusion approaches primarily concentrate on optimizing human visual perception and image metrics. To enhance the performance of the fusion network in subsequent high-level vision tasks, a segmentation network and a corresponding loss are introduced into the fusion network training process. Specifically, we devise a three-stage training strategy to render the fusion network more suitable for high-level vision tasks, guided by the segmentation network and broadening the fusion network's training set to boost its generalization capability. Secondly, current transformer-based image fusion methods neglect the interaction between visible texture features and infrared contrast features. To tackle this, the cross-feature transformer module is proposed, allowing the fusion network to learn the cross-feature correlation and long-range dependencies between source images, thus achieving fusion results with good complementarity. Finally, a dual-branch fusion network is proposed, based on the distinct characteristics of different images, that targets the extraction of deep features from source images utilizing contrast residual and texture enhancement modules to achieve improved fusion results. Extensive experimental results reveal that our HitFusion method excels in both qualitative and quantitative assessments, while also demonstrating superior performance in addressing high-level vision tasks.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?