Visible and Infrared Image Fusion Based on Attention and Multiscale Residuals

Zhongxu Xiang,Wentie Yang,Zuoshuai Wang,Yidong Xu,Vladimir Grischenko,Vladimir Korochentsev
DOI: https://doi.org/10.1117/12.3033768
2024-01-01
Abstract:Image fusion is a significant research area, related to a specific fusion task and has broad application prospects. Most of the existing image fusion algorithms are based on the pixel level, and although the convolutional network with more layers has a powerful feature extraction capability, its complexity increases with the deepening of the network. The dependency of the local range of the image is also not fully utilized in this process by the convolutional network, which makes the fused image detail information lost. In this paper, we propose a local-attention mechanism based network with multi-scale residuals to fuse visible and infrared images. The network consists of two key parts: encoder-decoder, fusion strategy. During the network training process, we utilize a phased training approach, where an automatic codec is first trained for conducting feature extraction, local feature enhancement and feature reconstruction. In the fusion stage, the coder trained in the first step is utilized to extract the two light image features, and then these features are fused by a multi-scale residual network, respectively. Finally, the fused features are inverse encoding to acquire the fused image. The experimental comparison results show that our fusion solution outperforms the existing fusion methods in both visual perception and objective evaluation.
What problem does this paper attempt to address?