Abstract:Infrared and visible image fusion has been developed from vision perception oriented fusion methods to strategies which both consider the vision perception and high-level vision task. However, the existing task-driven methods fail to address the domain gap between semantic and geometric representation. To overcome these issues, we propose a high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation, terms as HSFusion. Specifically, to minimize the gap between semantic and geometric representation, we design two separate domain transformation branches by CycleGAN framework, and each includes two processes: the forward segmentation process and the reverse reconstruction process. CycleGAN is capable of learning domain transformation patterns, and the reconstruction process of CycleGAN is conducted under the constraint of these patterns. Thus, our method can significantly facilitate the integration of semantic and geometric information and further reduces the domain gap. In fusion stage, we integrate the infrared and visible features that extracted from the reconstruction process of two seperate CycleGANs to obtain the fused result. These features, containing varying proportions of semantic and geometric information, can significantly enhance the high level vision tasks. Additionally, we generate masks based on segmentation results to guide the fusion task. These masks can provide semantic priors, and we design adaptive weights for two distinct areas in the masks to facilitate image fusion. Finally, we conducted comparative experiments between our method and eleven other state-of-the-art methods, demonstrating that our approach surpasses others in both visual appeal and semantic segmentation task.

HitFusion: Infrared and Visible Image Fusion for High-Level Vision Tasks Using Transformer

THFuse: An Infrared and Visible Image Fusion Network using Transformer and Hybrid Feature Extractor

HDCTfusion: Hybrid Dual-Branch Network Based on CNN and Transformer for Infrared and Visible Image Fusion

SDTFusion: A split-head dense transformer based network for infrared and visible image fusion

Rethinking Cross-Attention for Infrared and Visible Image Fusion

HATF: Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer

SFPFusion: An Improved Vision Transformer Combining Super Feature Attention and Wavelet-Guided Pooling for Infrared and Visible Images Fusion

DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer

MIFFuse: A Multi-Level Feature Fusion Network for Infrared and Visible Images

HSFusion: A high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation

TDDFusion: A Target-Driven Dual Branch Network for Infrared and Visible Image Fusion

Infrared and Visible Image Fusion Based on a Two-Stage Class Conditioned Auto-Encoder Network.

TCCFusion: An Infrared and Visible Image Fusion Method based on Transformer and Cross Correlation

DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion.

CGTF: Convolution-Guided Transformer for Infrared and Visible Image Fusion

SimpliFusion: a simplified infrared and visible image fusion network

GTMFuse: Group-Attention Transformer-Driven Multiscale Dense Feature-Enhanced Network for Infrared and Visible Image Fusion

GLFuse: A Global and Local Four-Branch Feature Extraction Network for Infrared and Visible Image Fusion

FuseFormer: A Transformer for Visual and Thermal Image Fusion

Boosting Target-Level Infrared and Visible Image Fusion with Regional Information Coordination.

A Deep Learning Framework for Infrared and Visible Image Fusion Without Strict Registration