DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion

Yuchen Guo,Ruoxiang Xu,Rongcheng Li,Zhenghao Wu,Weifeng Su

2024-09-16

Abstract:Multi-modality image fusion aims to integrate complementary data information from different imaging modalities into a single image. Existing methods often generate either blurry fused images that lose fine-grained semantic information or unnatural fused images that appear perceptually cropped from the inputs. In this work, we propose a novel two-phase discriminative autoencoder framework, termed DAE-Fuse, that generates sharp and natural fused images. In the adversarial feature extraction phase, we introduce two discriminative blocks into the encoder-decoder architecture, providing an additional adversarial loss to better guide feature extraction by reconstructing the source images. While the two discriminative blocks are adapted in the attention-guided cross-modality fusion phase to distinguish the structural differences between the fused output and the source inputs, injecting more naturalness into the results. Extensive experiments on public infrared-visible, medical image fusion, and downstream object detection datasets demonstrate our method's superiority and generalizability in both quantitative and qualitative evaluations.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the issues present in Multi-Modality Image Fusion (MMIF), specifically including the following aspects: 1. **Image Blurring and Detail Loss**: Existing methods often generate fused images that are blurry and lose fine-grained semantic information; or the generated images look unnatural, as if they were cropped from the input images. 2. **Insufficient Feature Extraction Capability**: Most existing methods do not design specialized feature extractors and corresponding loss functions for different characteristics of features, resulting in weak feature extraction capabilities. The generated fused images exhibit blurring between and within functional objects. 3. **Inter-Modal Bias**: Some autoencoder-based methods can effectively extract global and local features, but during the fusion stage, they directly concatenate features instead of organically combining features from different modalities. This leads to the fused image being biased towards the information of a specific modality while ignoring the details of the other modality. To address the above issues, the authors propose a novel two-stage discriminative autoencoder framework—DAE-Fuse. This framework achieves clear and natural fused image generation through adversarial feature extraction and attention-guided cross-modal fusion. Extensive experiments on multiple public datasets demonstrate its superiority and generalization capability. Additionally, this method can enhance the performance of downstream Multi-Modality Object Detection (MMOD) tasks without fine-tuning.

DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion

Infrared and Visible Image Fusion Based on a Two-Stage Class Conditioned Auto-Encoder Network.

DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion

AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention

DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

Unsupervised Image Fusion Method based on Feature Mutual Mapping

Advancing infrared and visible image fusion with an enhanced multiscale encoder and attention-based networks

DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution

DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion

OMOFuse: an Optimized Dual-Attention Mechanism Model for Infrared and Visible Image Fusion

MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training

DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion

SADFusion: A multi-scale infrared and visible image fusion method based on salient-aware and domain-specific

When Image Decomposition Meets Deep Learning: A Novel Infrared and Visible Image Fusion Method

Correlation-Guided Discriminative Cross-Modality Features Network for Infrared and Visible Image Fusion

Infrared and Visible Image Fusion via Interactive Compensatory Attention Adversarial Learning

DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer

MEEAFusion: Multi-Scale Edge Enhancement and Joint Attention Mechanism Based Infrared and Visible Image Fusion

Adaptive spatial and frequency experts fusion network for medical image fusion

Image Fusion Based on Feature Decoupling and Proportion Preserving.