Abstract:Multimodal image fusion has recently garnered increasing interest in the field of remote sensing. By leveraging the complementary information in different modalities, the fused results may be more favorable in characterizing objects of interest, thereby increasing the chance of a more comprehensive and accurate perception of the scene. Unfortunately, most existing fusion methods tend to extract modality-specific features independently without considering intermodal alignment and complementarity, leading to a suboptimal fusion process. To address this issue, we propose a novel interactive generative adversarial network (IG-GAN), for the task of multimodal image fusion. IG-GAN comprises guided dual streams tailored for enhanced learning of details and content, as well as cross-modal consistency. Specifically, a details-guided interactive running-in module (GIR1) and a content-guided interactive running-in module (GIR2) are developed, with the stronger modality serving as guidance for detail richness or content integrity, and the weaker one assisting. To fully integrate multigranularity features from dual-modality, a hierarchical fusion and reconstruction branch is established. Specifically, a shallow interactive fusion (SIF) module followed by a multilevel interactive fusion (MIF) module is designed to aggregate multilevel local and long-range features. Concerning feature decoding and fused image generation, a high-level interactive fusion and reconstruction module (HRM) is further developed. In addition, to empower the fusion network to generate fused images with complete content, sharp edges, and high fidelity without supervision, a loss function facilitating the mutual game between the generator and two discriminators is also formulated. Comparative experiments with 14 state-of-the-art methods are conducted on three datasets. Qualitative and quantitative results indicate that IG-GAN exhibits obvious superiority in terms of both visual effect and quantitative metrics. Moreover, experiments on two RGB-IR object detection datasets are also conducted, which demonstrate that IG-GAN can enhance the accuracy of object detection by integrating complementary information from different modalities.

Transformer Based Conditional GAN for Multimodal Image Fusion

Statistics Enhancement Generative Adversarial Networks for Diverse Conditional Image Synthesis

CT and MRI Image Fusion via Coupled Feature-Learning GAN

IG-GAN: Interactive Guided Generative Adversarial Networks for Multimodal Image Fusion

MHW-GAN: Multidiscriminator Hierarchical Wavelet Generative Adversarial Network for Multimodal Image Fusion.

AT-GAN: A generative adversarial network with attention and transition for infrared and visible image fusion

Fusion-UDCGAN: Multifocus Image Fusion via a U-Type Densely Connected Generation Adversarial Network

Multi-Modal Image Fusion Via Deep Laplacian Pyramid Hybrid Network

FuseGAN: Learning to fuse Multi-focus Image via Conditional Generative Adversarial Network

Two-stream Maximal Feature Attention-guided Contrastive-learning GAN for Image Fusion

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

Multi-focus images fusion via residual generative adversarial network

Siamese conditional generative adversarial network for multi-focus image fusion

A generative adversarial network with adaptive constraints for multi-focus image fusion

DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion

Coupled GAN With Relativistic Discriminators for Infrared and Visible Images Fusion

A Generative Adversarial Network For Medical Image Fusion

GANMcC: A Generative Adversarial Network With Multiclassification Constraints for Infrared and Visible Image Fusion

An attention-guided and wavelet-constrained generative adversarial network for infrared and visible image fusion

FFusionCGAN: An end-to-end fusion method for few-focus images using conditional GAN in cytopathological digital slides

CT-GAN: A conditional Generative Adversarial Network of transformer architecture for text-to-image