Abstract:Infrared images can provide prominent targets based on the radiation difference, making them suitable for use in all day and night conditions. On the other hand, visible images can offer texture details with high spatial resolution. Infrared and visible image fusion is promising to achieve the best of both. Conventional frequency or spatial multiscale transformation (MST) methods are good at preserving image details. Deep-learning-based methods become more and more popular in image fusion because they can preserve high-level semantic features. To tackle the challenge in extracting and fusing cross-modality and cross-domain information, we propose a spatial–frequency collaborative fusion (SFCFusion) framework that effectively fuses spatial and frequency information in the feature space. In the frequency domain, source images are decomposed into base and detail layers with existing frequency decomposition methods. In the spatial domain, a kernel-based saliency generation module is designed to preserve spatial region-level structural information. A deep-learning-based encoder is used to extract features from the source images, decomposed images, and saliency maps. In the shared feature space, we achieve cross-modality SFCFusion through our proposed adaptive fusion scheme. We have conducted experiments to compare our SFCFusion with both the conventional and deep learning approaches on the TNO, LLVIP, and M3FD datasets. The qualitative and quantitative evaluation results demonstrate the effectiveness of our SFCFusion. We have further demonstrated the superiority of our SFCFusion in the downstream detection task. Our code will be available at https://github.com/ChenHanrui430/SFCFusion.

S2F-Net: Shared-Specific Fusion Network for Infrared and Visible Image Fusion

SFCFusion: Spatial–Frequency Collaborative Infrared and Visible Image Fusion

A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion

DDFNet-A: Attention-Based Dual-Branch Feature Decomposition Fusion Network for Infrared and Visible Image Fusion

SADFusion: A multi-scale infrared and visible image fusion method based on salient-aware and domain-specific

Infrared and Visible Image Fusion Based on a Two-Stage Class Conditioned Auto-Encoder Network.

BCMFIFuse: A Bilateral Cross-Modal Feature Interaction-Based Network for Infrared and Visible Image Fusion

SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion

Correlation-Guided Discriminative Cross-Modality Features Network for Infrared and Visible Image Fusion

Fusion of Infrared and Visible Images Via Multi-Layer Convolutional Sparse Representation

CMFA_Net: A cross-modal feature aggregation network for infrared-visible image fusion

S4Fusion: Saliency-aware Selective State Space Model for Infrared Visible Image Fusion

DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion.

TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven Image Fusion Network

A Multi-Stage Visible and Infrared Image Fusion Network Based on Attention Mechanism

Visible and Infrared Image Fusion Based on Attention and Multiscale Residuals

SCFusion: Infrared and Visible Fusion Based on Salient Compensation

DSA-Net: Infrared and Visible Image Fusion via Dual-Stream Asymmetric Network

An Interactively Reinforced Paradigm for Joint Infrared-Visible Image Fusion and Saliency Object Detection

Integrating Parallel Attention Mechanisms and Multi-Scale Features for Infrared and Visible Image Fusion

SFINet: A semantic feature interactive learning network for full-time infrared and visible image fusion