Abstract:This paper introduces a novel infrared and visible image fusion network to address the limitations of auto‐encoder fusion networks. In the designed network, the encoder employs a multi‐branch cascade structure with convolution kernels of different sizes to extract multi‐scale features, and the fusion layer incorporates a non‐local attention module alongside a spatial feature fusion strategy for both global and local feature fusion. Comparative experiments on the TNO and MSRS datasets demonstrate that the proposed method outperforms other state‐of‐the‐art fusion approaches. In recent years, research on infrared and visible image fusion has mainly focused on deep learning‐based approaches, particularly deep neural networks with auto‐encoder architectures. However, these approaches suffer from problems such as insufficient feature extraction capability and inefficient fusion strategies. Therefore, this paper introduces a novel image fusion network to address the limitations of infrared and visible image fusion networks with auto‐encoder architectures. In the designed network, the encoder employs a multi‐branch cascade structure, and these convolution branches with different kernel sizes provide the encoder with an adaptive receptive field to extract multi‐scale features. In addition, the fusion layer incorporates a non‐local attention module that is inspired by the self‐attention mechanism. With its global receptive field, this module is used to build a non‐local attention fusion network, which works together with the l1 ‐norm spatial fusion strategy to extract, split, filter, and fuse global and local features. Comparative experiments on the TNO and MSRS datasets demonstrate that the proposed method outperforms other state‐of‐the‐art fusion approaches.

Cross-UNet: dual-branch infrared and visible image fusion framework based on cross-convolution and attention mechanism

Attention based dual UNET network for infrared and visible image fusion

CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach

Fusion of Infrared and Visible Images Via Multi-Layer Convolutional Sparse Representation

A Cross-scale Iterative Attentional Adversarial Fusion Network for Infrared and Visible Images

Infrared and Visible Image Fusion Based on a Two-Stage Class Conditioned Auto-Encoder Network.

Multi-scale unsupervised network for infrared and visible image fusion based on joint attention mechanism

Rethinking Cross-Attention for Infrared and Visible Image Fusion

MCFusion: infrared and visible image fusion based multiscale receptive field and cross-modal enhanced attention mechanism

Integrating Parallel Attention Mechanisms and Multi-Scale Features for Infrared and Visible Image Fusion

An infrared and visible image fusion network based on multi‐scale feature cascades and non‐local attention

Visible and Infrared Image Fusion Based on Attention and Multiscale Residuals

HATF: Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer

HDCTfusion: Hybrid Dual-Branch Network Based on CNN and Transformer for Infrared and Visible Image Fusion

CMRFusion: A cross-domain multi-resolution fusion method for infrared and visible image fusion

A Multi-Stage Visible and Infrared Image Fusion Network Based on Attention Mechanism

BCMFIFuse: A Bilateral Cross-Modal Feature Interaction-Based Network for Infrared and Visible Image Fusion

DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion.

ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection

CHFusion: A Cross-modality High-resolution Representation Framework for Infrared and Visible Image Fusion