Transformer Based Conditional GAN for Multimodal Image Fusion

Jun Zhang,Licheng Jiao,Wenping Ma,Fang Liu,Xu Liu,Lingling Li,Puhua Chen,Shuyuan Yang
DOI: https://doi.org/10.1109/tmm.2023.3243659
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Multimodal Image fusion is becoming urgent in multi-sensor information utilization. However, existing end-to-end image fusion frameworks ignore a priori knowledge integration and long-distance dependencies across domains, which brings challenges to the network convergence and global image perception in complex scenes. In this article, a conditional generative adversarial network with transformer (TCGAN) is proposed for multimodal image fusion. The generator is to generate a fused image with the source images content. The discriminators are adopted to distinguish the differences between the fused image and the source images. Adversarial training makes the final fused image to maintain the structural and textural details in the cross-modal images simultaneously. In particular, a wavelet fusion module makes the inputs contain image content from different domains as much as possible. The extracted convolutional features interact in the multiscale cross-modal transformer fusion module to fully complement the associated information. It makes the generator to focus on both local and global context. TCGAN fully considers the training efficiency of the adversarial process and the integrated retention of redundant information. Various experimental results of TCGAN have highlighted targets, rich details, and fast convergence properties on public datasets.
What problem does this paper attempt to address?