Photorealistic Image Fusion Using Y-Net-Based Extractor and Global-Local Discriminator

Danqing Yang,Naibo Zhu,Xiaorui Wang,Shuang Li
DOI: https://doi.org/10.2139/ssrn.4697016
2024-01-01
Abstract:Although some deep learning-based image fusion approaches have realized promising results, how to extract information-rich features from different source images while preserving them in the fused image with less distortions remains challenging issue that needs to be addressed. Here, we propose a well worked-out GAN-based scheme with multi-scale feature extractor and global-local discriminator for photorealistic infrared and visible image fusion. As diverse learning-based solutions benefit from multi-scale representations, the generator, designed based on Y-Net and with the introduction of the idea of residual dense block (RDblock), can learn discriminative multi-scale representations that are closer to the essence of different modal images to yield more realistic fused images for infrared and visible images. During feature reconstruction, the cross-modality shortcuts with contextual attention (CMSCA) are employed to selectively aggregate features at different scales and different levels to construct information-rich fused images with better visual effect. To ameliorate the information content and aesthetics of the fused image, we not only constrain the structure and contrast information using structural similarity index, but also evaluate the intensity and gradient similarities at both feature and image levels. Two global-local discriminators that combine global GNA with PatchGAN as a unified architecture help to dig for finer differences between the generated image and reference images, which force the generator to learn both the local radiation information and pervasive global details in two source images. It is worth mentioning that image fusion is achieved during confrontation without fusion rules. Lots of assessment tests demonstrate that the reported fusion scheme achieves superior performance against state-of-the-art works in both meaningful information preservation and image aesthetics.
What problem does this paper attempt to address?