Abstract:Image fusion is a technique to integrate information from multiple source images with complementary information to improve the richness of a single image. Due to insufficient task-specific training data and corresponding ground truth, most existing end-to-end image fusion methods easily fall into overfitting or tedious parameter optimization processes. Two-stage methods avoid the need of large amount of task-specific training data by training encoder-decoder network on large natural image datasets and utilizing the extracted features for fusion, but the domain gap between natural images and different fusion tasks results in limited performance. In this study, we design a novel encoder-decoder based image fusion framework and propose a destruction-reconstruction based self-supervised training scheme to encourage the network to learn task-specific features. Specifically, we propose three destruction-reconstruction self-supervised auxiliary tasks for multi-modal image fusion, multi-exposure image fusion and multi-focus image fusion based on pixel intensity non-linear transformation, brightness transformation and noise transformation, respectively. In order to encourage different fusion tasks to promote each other and increase the generalizability of the trained network, we integrate the three self-supervised auxiliary tasks by randomly choosing one of them to destroy a natural image in model training. In addition, we design a new encoder that combines CNN and Transformer for feature extraction, so that the trained model can exploit both local and global information. Extensive experiments on multi-modal image fusion, multi-exposure image fusion and multi-focus image fusion tasks demonstrate that our proposed method achieves the state-of-the-art performance in both subjective and objective evaluations. The code will be publicly available soon.

Lfdt-Fusion: A Latent Feature-Guided Diffusion Transformer Model for General Image Fusion

DGFusion: an Effective Dynamic Generalizable Network for Infrared and Visible Image Fusion

Image Fusion Based on Feature Decoupling and Proportion Preserving.

FusionDiff: A unified image fusion network based on diffusion probabilistic models

DGLT-Fusion: A Decoupled Global–local Infrared and Visible Image Fusion Transformer

UNIFusion: A Lightweight Unified Image Fusion Network

Fusion-UDCGAN: Multifocus Image Fusion via a U-Type Densely Connected Generation Adversarial Network

FusionDiff: Multi-focus image fusion using denoising diffusion probabilistic models

DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer

SDTFusion: A split-head dense transformer based network for infrared and visible image fusion

Asffuse: Infrared and Visible Image Fusion Model Based on Adaptive Selection Feature Maps

A General Image Fusion Framework Using Multi-Task Semi-Supervised Learning

Infrared/Visible Light Fire Image Fusion Method Based on Generative Adversarial Network of Wavelet-Guided Pooling Vision Transformer

Multimodal Image Fusion Based on Diffusion Model

DM-Fusion: Deep Model-Driven Network for Heterogeneous Image Fusion.

DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion

THFuse: An Infrared and Visible Image Fusion Network using Transformer and Hybrid Feature Extractor

DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

TransFuse: A Unified Transformer-based Image Fusion Framework using Self-supervised Learning

Trans2Fuse: Empowering image fusion through self-supervised learning and multi-modal transformations via transformer networks

Fusiondn: A Unified Densely Connected Network For Image Fusion