DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion

Yuchen Guo,Ruoxiang Xu,Rongcheng Li,Zhenghao Wu,Weifeng Su
2024-09-16
Abstract:Multi-modality image fusion aims to integrate complementary data information from different imaging modalities into a single image. Existing methods often generate either blurry fused images that lose fine-grained semantic information or unnatural fused images that appear perceptually cropped from the inputs. In this work, we propose a novel two-phase discriminative autoencoder framework, termed DAE-Fuse, that generates sharp and natural fused images. In the adversarial feature extraction phase, we introduce two discriminative blocks into the encoder-decoder architecture, providing an additional adversarial loss to better guide feature extraction by reconstructing the source images. While the two discriminative blocks are adapted in the attention-guided cross-modality fusion phase to distinguish the structural differences between the fused output and the source inputs, injecting more naturalness into the results. Extensive experiments on public infrared-visible, medical image fusion, and downstream object detection datasets demonstrate our method's superiority and generalizability in both quantitative and qualitative evaluations.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issues present in Multi-Modality Image Fusion (MMIF), specifically including the following aspects: 1. **Image Blurring and Detail Loss**: Existing methods often generate fused images that are blurry and lose fine-grained semantic information; or the generated images look unnatural, as if they were cropped from the input images. 2. **Insufficient Feature Extraction Capability**: Most existing methods do not design specialized feature extractors and corresponding loss functions for different characteristics of features, resulting in weak feature extraction capabilities. The generated fused images exhibit blurring between and within functional objects. 3. **Inter-Modal Bias**: Some autoencoder-based methods can effectively extract global and local features, but during the fusion stage, they directly concatenate features instead of organically combining features from different modalities. This leads to the fused image being biased towards the information of a specific modality while ignoring the details of the other modality. To address the above issues, the authors propose a novel two-stage discriminative autoencoder framework—DAE-Fuse. This framework achieves clear and natural fused image generation through adversarial feature extraction and attention-guided cross-modal fusion. Extensive experiments on multiple public datasets demonstrate its superiority and generalization capability. Additionally, this method can enhance the performance of downstream Multi-Modality Object Detection (MMOD) tasks without fine-tuning.