CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion

Jinyuan Liu,Runjia Lin,Guanyao Wu,Risheng Liu,Zhongxuan Luo,Xin Fan
2023-10-14
Abstract:Infrared and visible image fusion targets to provide an informative image by combining complementary information from different sensors. Existing learning-based fusion approaches attempt to construct various loss functions to preserve complementary features, while neglecting to discover the inter-relationship between the two modalities, leading to redundant or even invalid information on the fusion results. Moreover, most methods focus on strengthening the network with an increase in depth while neglecting the importance of feature transmission, causing vital information degeneration. To alleviate these issues, we propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion in an end-to-end manner. Concretely, to simultaneously retain typical features from both modalities and to avoid artifacts emerging on the fused result, we develop a coupled contrastive constraint in our loss function. In a fused image, its foreground target / background detail part is pulled close to the infrared / visible source and pushed far away from the visible / infrared source in the representation space. We further exploit image characteristics to provide data-sensitive weights, allowing our loss function to build a more reliable relationship with source images. A multi-level attention module is established to learn rich hierarchical feature representation and to comprehensively transfer features in the fusion process. We also apply the proposed CoCoNet on medical image fusion of different types, e.g., magnetic resonance image, positron emission tomography image, and single photon emission computed tomography image. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation, especially in preserving prominent targets and recovering vital textural details.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to retain complementary information while eliminating redundant information in infrared and visible - light image fusion, and improve the quality of the fused image. Specifically: 1. **Retention of Complementary Information and Elimination of Redundancy**: Existing learning - based methods often overlook the discovery of the intrinsic relationship between the two modalities when constructing the loss function, resulting in redundant or even invalid information in the fusion results. Moreover, most methods focus on strengthening the network by increasing the network depth while ignoring the importance of feature transmission, which may lead to the degradation of key information. Therefore, a method is required to effectively retain the typical features from different modalities while avoiding artifacts in the fusion results. 2. **Adaptation to Specific Source Images**: Existing methods usually rely on manually adjusting the trade - off parameters in the loss function, which is time - consuming and difficult to adapt to different source image characteristics. Therefore, a data - driven mechanism is needed to automatically calculate the degree of information retention to enhance the intensity and detail consistency between the source images and the fusion results. 3. **Multi - level Feature Representation**: To ensure the comprehensive utilization of features in the fusion process, a multi - level attention module needs to be designed to learn rich hierarchical feature representations and effectively avoid feature degradation. To this end, the authors propose a Coupled Contrastive Learning Network (CoCoNet). By introducing coupled contrastive constraints, the model is guided to distinguish significant complementary features (such as objects and texture details), enabling the model to extract and fuse the required features from each modality. In addition, by designing a multi - level attention module, the network can learn rich hierarchical feature representations, ensuring that these features are fully utilized in the fusion process. Experimental results show that CoCoNet outperforms the other nine state - of - the - art infrared and visible - light image fusion methods on multiple datasets, especially in retaining significant objects and restoring key texture details.