Abstract:In infrared and visible image fusion (IVIF), prior knowledge constraints established with image-level information often ignore the identity and differences between source image features and cannot fully utilize the complementary information role of infrared images to visible images. For this purpose, this study develops a Contrastive learning-based Self-Supervised fusion model (CS2Fusion), which considers infrared images as a complement to visible images, and develops a Compensation Perception Network (CPN) to guide the backbone network to generate fusion images by estimating the feature compensation map of infrared images. The core idea behind this method is based on the following observations: (1) there is usually a significant disparity in semantic information between different modalities; (2) despite the large semantic differences, the distribution of self-correlation and saliency features tends to be similar among the same modality features. Building upon these observations, we use self-correlation and saliency operation (SSO) to construct positive and negative pairs, driving CPN to perceive the complementary features of infrared images relative to visible images under the constraint of contrastive loss. CPN also incorporates a self-supervised learning mechanism, where visually impaired areas are simulated by randomly cropping patches from visible images to provide more varied information of the same scene to form multiple positive samples to enhance the model’s fine-grained perception capability. In addition, we also designed a demand-driven module (DDM) in the backbone network, which actively queries to improve the information between layers in the image reconstruction, and then integrates more spatial structural information. Notably, the CPN as an auxiliary network is only used in training to drive the backbone network to complete the IVIF in a self-supervised form. Experiments on various benchmark datasets and high-level vision tasks demonstrate the superiority of our CS2Fusion over the state-of-the-art IVIF method.

M2CNet: an Infrared and Visible Image Fusion Method Based on Dual Marginal Contrastive Learning

Infrared and Visible Image Fusion Based on a Two-Stage Class Conditioned Auto-Encoder Network.

CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion

DCFusion: Difference correlation-driven fusion mechanism of infrared and visible images

CMEFusion: Cross-Modal Enhancement and Fusion of FIR and Visible Images

MFTCFNet: Infrared and Visible Image Fusion Network Based on Multi-Layer Feature Tightly Coupled

DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion.

Multi-Modality Image Fusion and Object Detection Based on Semantic Information

MAFusion: Multiscale Attention Network for Infrared and Visible Image Fusion

A Multi-Stage Visible and Infrared Image Fusion Network Based on Attention Mechanism

MMF: A Multi-scale MobileNet Based Fusion Method for Infrared and Visible Image

Correlation-Guided Discriminative Cross-Modality Features Network for Infrared and Visible Image Fusion

DCFusion: Dual-Headed Fusion Strategy and Contextual Information Awareness for Infrared and Visible Remote Sensing Image

CMFA_Net: A cross-modal feature aggregation network for infrared-visible image fusion

A Self-Supervised Fusion for Infrared and Visible Images Via Multi-Level Contrastive Auto-Encoding

SCFusion: Infrared and Visible Fusion Based on Salient Compensation

M2FNet: Multi-modal Fusion Network for Object Detection from Visible and Thermal Infrared Images

CS2Fusion: Contrastive Learning for Self-Supervised Infrared and Visible Image Fusion by Estimating Feature Compensation Map

Target recognition with fusion of visible and infrared images based on mutual learning

Multi-scale attention-based lightweight network with dilated convolutions for infrared and visible image fusion

Contrast Saliency Information Guided Infrared and Visible Image Fusion.