Abstract:The fusion of far infrared (FIR) and visible images aims to generate a high-quality composite image that contains salient structures and abundant texture details for human visual perception. However, the existing fusion methods typically fall short of utilizing complementary source image characteristics to boost the features extracted from degraded visible or FIR images, thus they cannot generate satisfactory fusion results in adverse lighting or weather conditions. In this paper, we propose a novel Cross-Modal multispectral image Enhancement and Fusion framework (CMEFusion), which adaptively enhances both FIR and visible inputs by leveraging complementary cross-modal features to further facilitate multispectral feature aggregation. Specifically, we first present a new cross-modal image enhancement sub-network (CMIENet), which is built on a CNN-Transformer hybrid architecture to perform the complementary exchange of local-salient and global-contextual features extracted from FIR and visible modalities, respectively. Then, we design a gradient-content differential fusion sub-network (GCDFNet) to progressively integrate decoupled gradient and content information via modified central difference convolution. Finally, we present a comprehensive joint enhancement-fusion multi-term loss function to drive the model to narrow the optimization gap between the above-mentioned two sub-networks based on the self-supervised aspects of exposure, color, structure, and intensity. In this manner, the proposed CMEFusion model facilitates better-performing visible and FIR image fusion in an end-to-end way, achieving enhanced visual quality with more natural and realistic appearances. Extensive experiments validate that CMEFusion surpasses state-of-the-art image fusion algorithms, as evidenced by superior performance in both visual quality and quantitative evaluations.

LeGFusion: Locally Enhanced Global Learning for Multimodal Image Fusion

LeGFusion: Locally-enhanced Global Learning for Multi-Modal Image Fusion

GALFusion: Multi-Exposure Image Fusion via a Global–Local Aggregation Learning Network

Multimodal Image Fusion Via Self-Supervised Transformer

Multi-Modal Image Fusion via Self-Supervised Transformer

CMEFusion: Cross-Modal Enhancement and Fusion of FIR and Visible Images

Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion

Equivariant Multi-Modality Image Fusion

Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion

Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion

Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer

SADFusion: A multi-scale infrared and visible image fusion method based on salient-aware and domain-specific

MM-Net: A MixFormer-Based Multi-Scale Network for Anatomical and Functional Image Fusion

MEFusion: Unsupervised Mutual Enhancement for Multimodal Image Fusion

CIRF: Coupled Image Reconstruction and Fusion Strategy for Deep Learning Based Multi-Modal Image Fusion

A Multi-Stage Visible and Infrared Image Fusion Network Based on Attention Mechanism

A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion

DCFusion: Dual-Headed Fusion Strategy and Contextual Information Awareness for Infrared and Visible Remote Sensing Image

Diff-IF: Multi-modality image fusion via diffusion model with fusion knowledge prior

Fusionmlp: A Mlp-Based Unified Image Fusion Framework

IGNFusion: An Unsupervised Information Gate Network for Multimodal Medical Image Fusion