Abstract:The fusion of far infrared (FIR) and visible images aims to generate a high-quality composite image that contains salient structures and abundant texture details for human visual perception. However, the existing fusion methods typically fall short of utilizing complementary source image characteristics to boost the features extracted from degraded visible or FIR images, thus they cannot generate satisfactory fusion results in adverse lighting or weather conditions. In this paper, we propose a novel Cross-Modal multispectral image Enhancement and Fusion framework (CMEFusion), which adaptively enhances both FIR and visible inputs by leveraging complementary cross-modal features to further facilitate multispectral feature aggregation. Specifically, we first present a new cross-modal image enhancement sub-network (CMIENet), which is built on a CNN-Transformer hybrid architecture to perform the complementary exchange of local-salient and global-contextual features extracted from FIR and visible modalities, respectively. Then, we design a gradient-content differential fusion sub-network (GCDFNet) to progressively integrate decoupled gradient and content information via modified central difference convolution. Finally, we present a comprehensive joint enhancement-fusion multi-term loss function to drive the model to narrow the optimization gap between the above-mentioned two sub-networks based on the self-supervised aspects of exposure, color, structure, and intensity. In this manner, the proposed CMEFusion model facilitates better-performing visible and FIR image fusion in an end-to-end way, achieving enhanced visual quality with more natural and realistic appearances. Extensive experiments validate that CMEFusion surpasses state-of-the-art image fusion algorithms, as evidenced by superior performance in both visual quality and quantitative evaluations.

DCFusion: Difference correlation-driven fusion mechanism of infrared and visible images

DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion.

Correlation-Guided Discriminative Cross-Modality Features Network for Infrared and Visible Image Fusion

DCFusion: Dual-Headed Fusion Strategy and Contextual Information Awareness for Infrared and Visible Remote Sensing Image

TCCFusion: An Infrared and Visible Image Fusion Method based on Transformer and Cross Correlation

SFCFusion: Spatial–Frequency Collaborative Infrared and Visible Image Fusion

Fusion of Infrared and Visible Images Via Multi-Layer Convolutional Sparse Representation

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion

DTFusion: Infrared and Visible Image Fusion Based on Dense Residual PConv-ConvNeXt and Texture-Contrast Compensation

SCFusion: Infrared and Visible Fusion Based on Salient Compensation

DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion

TDDFusion: A Target-Driven Dual Branch Network for Infrared and Visible Image Fusion

MIFFuse: A Multi-Level Feature Fusion Network for Infrared and Visible Images

CMEFusion: Cross-Modal Enhancement and Fusion of FIR and Visible Images

DDFNet-A: Attention-Based Dual-Branch Feature Decomposition Fusion Network for Infrared and Visible Image Fusion

Dif-Fusion: Toward High Color Fidelity in Infrared and Visible Image Fusion With Diffusion Models

Dif-Fusion: Towards High Color Fidelity in Infrared and Visible Image Fusion with Diffusion Models

Infrared-visible Image Fusion Using Accelerated Convergent Convolutional Dictionary Learning

DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer

DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion