Infrared and Visible Image Fusion Based on a Two-Stage Class Conditioned Auto-Encoder Network.

Yanpeng Cao,Xing Luo,Xi Tong,Jiangxin Yang,Yanlong Cao
DOI: https://doi.org/10.1016/j.neucom.2023.126248
IF: 6
2023-01-01
Neurocomputing
Abstract:Existing auto-encoder based infrared and visible image fusion methods typically utilize a shared encoder to extract features from different modalities and adopt a handcrafted fusion strategy to fuse the extracted features into intermediate representation before the decoder part. In this paper, we present a novel two-stage class conditioned auto-encoder framework for high-quality multispectral fusion tasks. In the first training stage, we introduce a class embedding sub-branch to the encoder network for modeling the characteristics of different modalities and adaptively scaling the intermediate features based on the input modality. Moreover, we design a cross-transfer residual block to promote the content and texture information flow in the encoder for generating more representative features. In the second training stage, we insert a learnable fusion module between the pre-trained class conditioned encoder and decoder parts to replace the handcrafted fusion strategy. Specific intensity and gradient loss functions are utilized to tune the model for the fusion of distinctive deep features in a data-driven manner. With the important designs including the class conditioned auto-encoder and the two-stage training strategy, our proposed TS-ClassFuse can better preserve distinctive information/features from the source images and decrease the training difficulty for simultaneously extracting informative features and determining the optimal fusion scheme. Experimental results verify the effectiveness of our method in terms of both qualitative and quantitative evaluations.
What problem does this paper attempt to address?