MFT: Multi-scale Fusion Transformer for Infrared and Visible Image Fusion

Chen-Ming Zhang,Chengbo Yuan,Yong Luo,Xin Zhou
DOI: https://doi.org/10.1007/978-3-031-44223-0_39
2023-01-01
Abstract:This paper studies the problem of fusing the infrared and visible images to improve the quality of target image. Traditional image fusion algorithms usually utilize convolutional neural network (CNN) for feature extraction and fusion, and thus can only exploit local information. Some recent approaches combines CNN and Transformer to capture long-range dependencies, but the global contextual information in the images still cannot be full exploited. To improve the ability of capturing global information, we propose a novel multi-scale fusion transformer (MFT) to fuse the infrared and visible images. In the encoder of our MFT, a multi-head pooling attention module is utilized to extract both local features and long-range dependencies for the input image. Then a novel dual-branch fusion module is designed to simultaneously exploit the global contextual and infrared-visible complementary information in the fusion process. Experimental results show that the proposed method can effectively improve the subjective visual experience of the infrared-visible fused image, and outperforms many recent and competitive counterparts in terms of most objective evaluation criteria.
What problem does this paper attempt to address?