Abstract:Abstract Infrared and visible image fusion aims to generate synthetic images including salient targets and abundant texture details. However, traditional techniques and recent deep learning-based approaches have faced challenges in preserving prominent structures and fine-grained features. In this study, we propose a lightweight infrared and visible image fusion network utilizing multi-scale attention modules and hybrid dilated convolutional blocks to preserve significant structural features and fine-grained textural details. First, we design a hybrid dilated convolutional block with different dilation rates that enable the extraction of prominent structure features by enlarging the receptive field in the fusion network. Compared with other deep learning methods, our method can obtain more high-level semantic information without piling up a large number of convolutional blocks, effectively improving the ability of feature representation. Second, distinct attention modules are designed to integrate into different layers of the network to fully exploit contextual information of the source images, and we leverage the total loss to guide the fusion process to focus on vital regions and compensate for missing information. Extensive qualitative and quantitative experiments demonstrate the superiority of our proposed method over state-of-the-art methods in both visual effects and evaluation metrics. The experimental results on public datasets show that our method can improve the entropy (EN) by 4.80%, standard deviation (SD) by 3.97%, correlation coefficient (CC) by 1.86%, correlations of differences (SCD) by 9.98%, and multi-scale structural similarity (MS_SSIM) by 5.64%, respectively. In addition, experiments with the VIFB dataset further indicate that our approach outperforms other comparable models.

HDCCT: Hybrid Densely Connected CNN and Transformer for Infrared and Visible Image Fusion

Fusion of Low-Illuminance Visible and Near-Infrared Images Based on Convolutional Neural Networks

HDCTfusion: Hybrid Dual-Branch Network Based on CNN and Transformer for Infrared and Visible Image Fusion

Multi-Modal Image Fusion Via Deep Laplacian Pyramid Hybrid Network

HATF: Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer

THFuse: An Infrared and Visible Image Fusion Network using Transformer and Hybrid Feature Extractor

A Deep Learning Framework for Infrared and Visible Image Fusion Without Strict Registration

DCFusion: Dual-Headed Fusion Strategy and Contextual Information Awareness for Infrared and Visible Remote Sensing Image

Fusion of Infrared and Visible Images Via Multi-Layer Convolutional Sparse Representation

Multi-scale attention-based lightweight network with dilated convolutions for infrared and visible image fusion

Infrared and Visible Image Fusion Based on a Two-Stage Class Conditioned Auto-Encoder Network.

CTFusion: CNN-transformer-based self-supervised learning for infrared and visible image fusion

Infrared and Visible Image Fusion with Convolutional Neural Networks.

TCCFusion: An Infrared and Visible Image Fusion Method based on Transformer and Cross Correlation

CGTF: Convolution-Guided Transformer for Infrared and Visible Image Fusion

GTMFuse: Group-Attention Transformer-Driven Multiscale Dense Feature-Enhanced Network for Infrared and Visible Image Fusion

SFPFusion: An Improved Vision Transformer Combining Super Feature Attention and Wavelet-Guided Pooling for Infrared and Visible Images Fusion

MFST: Multi-Modal Feature Self-Adaptive Transformer for Infrared and Visible Image Fusion

Infrared-visible Image Fusion Using Accelerated Convergent Convolutional Dictionary Learning

Multi-scale Convolutional Neural Networks and Saliency Weight Maps for Infrared and Visible Image Fusion

A Joint Convolutional Cross ViT Network for Hyperspectral and Light Detection and Ranging Fusion Classification