Abstract:Infrared and visible image fusion targets to provide an informative image by combining complementary information from different sensors. Existing learning-based fusion approaches attempt to construct various loss functions to preserve complementary features, while neglecting to discover the inter-relationship between the two modalities, leading to redundant or even invalid information on the fusion results. Moreover, most methods focus on strengthening the network with an increase in depth while neglecting the importance of feature transmission, causing vital information degeneration. To alleviate these issues, we propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion in an end-to-end manner. Concretely, to simultaneously retain typical features from both modalities and to avoid artifacts emerging on the fused result, we develop a coupled contrastive constraint in our loss function. In a fused image, its foreground target / background detail part is pulled close to the infrared / visible source and pushed far away from the visible / infrared source in the representation space. We further exploit image characteristics to provide data-sensitive weights, allowing our loss function to build a more reliable relationship with source images. A multi-level attention module is established to learn rich hierarchical feature representation and to comprehensively transfer features in the fusion process. We also apply the proposed CoCoNet on medical image fusion of different types, e.g., magnetic resonance image, positron emission tomography image, and single photon emission computed tomography image. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation, especially in preserving prominent targets and recovering vital textural details.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to retain complementary information while eliminating redundant information in infrared and visible - light image fusion, and improve the quality of the fused image. Specifically: 1. **Retention of Complementary Information and Elimination of Redundancy**: Existing learning - based methods often overlook the discovery of the intrinsic relationship between the two modalities when constructing the loss function, resulting in redundant or even invalid information in the fusion results. Moreover, most methods focus on strengthening the network by increasing the network depth while ignoring the importance of feature transmission, which may lead to the degradation of key information. Therefore, a method is required to effectively retain the typical features from different modalities while avoiding artifacts in the fusion results. 2. **Adaptation to Specific Source Images**: Existing methods usually rely on manually adjusting the trade - off parameters in the loss function, which is time - consuming and difficult to adapt to different source image characteristics. Therefore, a data - driven mechanism is needed to automatically calculate the degree of information retention to enhance the intensity and detail consistency between the source images and the fusion results. 3. **Multi - level Feature Representation**: To ensure the comprehensive utilization of features in the fusion process, a multi - level attention module needs to be designed to learn rich hierarchical feature representations and effectively avoid feature degradation. To this end, the authors propose a Coupled Contrastive Learning Network (CoCoNet). By introducing coupled contrastive constraints, the model is guided to distinguish significant complementary features (such as objects and texture details), enabling the model to extract and fuse the required features from each modality. In addition, by designing a multi - level attention module, the network can learn rich hierarchical feature representations, ensuring that these features are fully utilized in the fusion process. Experimental results show that CoCoNet outperforms the other nine state - of - the - art infrared and visible - light image fusion methods on multiple datasets, especially in retaining significant objects and restoring key texture details.

CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion

Fusion of Low-Illuminance Visible and Near-Infrared Images Based on Convolutional Neural Networks

Correlation-Guided Discriminative Cross-Modality Features Network for Infrared and Visible Image Fusion

Fusion of Infrared and Visible Images Via Multi-Layer Convolutional Sparse Representation

CCSR-Net: Unfolding Coupled Convolutional Sparse Representation for Multi-focus Image Fusion.

C2IENet: Multi-branch medical image fusion based on contrastive constraint features and information exchange

DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion.

CMFA_Net: A cross-modal feature aggregation network for infrared-visible image fusion

Infrared and Visible Image Fusion with Convolutional Neural Networks.

DCFusion: Difference correlation-driven fusion mechanism of infrared and visible images

BCMFIFuse: A Bilateral Cross-Modal Feature Interaction-Based Network for Infrared and Visible Image Fusion

Fusion of Infrared and Visible Light Images Based on Improved Adaptive Dual-Channel Pulse Coupled Neural Network

Visible and Infrared Image Fusion Based on Attention and Multiscale Residuals

A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion

SFCFusion: Spatial–Frequency Collaborative Infrared and Visible Image Fusion

Multi-scale Convolutional Neural Networks and Saliency Weight Maps for Infrared and Visible Image Fusion

CMEFusion: Cross-Modal Enhancement and Fusion of FIR and Visible Images

Multi-scale attention-based lightweight network with dilated convolutions for infrared and visible image fusion

Multi-Modal Medical Image Fusion Based on FusionNet in YIQ Color Space

A Multi-Stage Visible and Infrared Image Fusion Network Based on Attention Mechanism

FusionCPP: Cooperative fusion of infrared and visible light images based on PCNN and PID control systems