Abstract:Existing image fusion approaches are committed to using a single deep network to solve different image fusion problems, achieving promising performance in recent years. However, devoid of the ground-truth output, in these methods, only the appearance from source images can be exploited during the training process to generate the fused images, resulting in suboptimal solutions. To this end, we advocate a self-evolutionary training formula by introducing a novel memory unit architecture (MUFusion). In this unit, specifically, we utilize the intermediate fusion results obtained during the training process to further collaboratively supervise the fused image. In this way, our fusion results can not only learn from the original input images, but also benefit from the intermediate output of the network itself. Furthermore, an adaptive unified loss function is designed based on this memory unit, which is composed of two loss items, i.e. , content loss and memory loss. In particular, the content loss is calculated based on the activity level maps of source images, which can constrain the output image to contain specific information. On the other hand, the memory loss is obtained based on the previous output of our model, which is utilized to force the network to yield fusion results with higher quality. Considering the handcrafted activity level maps cannot consistently reflect the accurate salience judgement, we put two adaptive weight items between them to prevent this degradation phenomenon. In general, our MUFusion can effectively handle a series of image fusion tasks, including infrared and visible image fusion, multi-focus image fusion, multi-exposure image fusion, and medical image fusion. Particularly, the source images are concatenated in the channel dimension. After that, a densely connected feature extraction network with two scales is used to extract the deep features of the source images. Following this, the fusion result is obtained by two feature reconstruction blocks with skip connections from the feature extraction network. Qualitative and quantitative experiments on 4 image fusion subtasks demonstrate the superiority of our MUFusion, compared to the state-of-the-art methods.

TUFusion: A Transformer-based Universal Fusion Algorithm for Multimodal Images

Multi-Modal Image Fusion Via Deep Laplacian Pyramid Hybrid Network

Trans2Fuse: Empowering image fusion through self-supervised learning and multi-modal transformations via transformer networks

RTFusion: A Multimodal Fusion Network with Significant Information Enhancement

MEFusion: Unsupervised Mutual Enhancement for Multimodal Image Fusion

U2Fusion: A Unified Unsupervised Image Fusion Network

TransFuse: A Unified Transformer-based Image Fusion Framework using Self-supervised Learning

Transformer-Based End-to-End Anatomical and Functional Image Fusion

MACTFusion: Lightweight Cross Transformer for Adaptive Multimodal Medical Image Fusion

Multi-Focus Image Fusion Using U-Shaped Networks with a Hybrid Objective

THFuse: An Infrared and Visible Image Fusion Network using Transformer and Hybrid Feature Extractor

MDC-RHT: Multi-Modal Medical Image Fusion via Multi-Dimensional Dynamic Convolution and Residual Hybrid Transformer

Image Fusion Transformer

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

Multimodal Token Fusion for Vision Transformers

Mutually Beneficial Transformer for Multimodal Data Fusion

A multimodal hyper-fusion transformer for remote sensing image classification

Feature Fusion Based on Transformer for Cross-modal Retrieval

MUFusion: A general unsupervised image fusion network based on memory unit

CrossFuse: A Novel Cross Attention Mechanism based Infrared and Visible Image Fusion Approach

FuseFormer: A Transformer for Visual and Thermal Image Fusion