Combining transformers with CNN for multi-focus image fusion

Zhao Duan,Xiaoliu Luo,Taiping Zhang
DOI: https://doi.org/10.1016/j.eswa.2023.121156
IF: 8.5
2023-08-21
Expert Systems with Applications
Abstract:Recently, deep convolutional neural network (CNN) based methods for multi-focus image fusion have achieved adequate performance. However, most of them cannot obtain spatially continuous results, especially in smooth regions and edges between focused and defocused regions. In this paper, we propose a novel end-to-end method, which merits both Transformers and CNNs, as a strong alternative for multi-focus image fusion task. Transformer has advantages over a CNN in that it can extract global features. It is able to make the fusion results to be spatially consistent. The proposed architecture consists of CNN and transformer branches, where transformer branches take feature map patches as inputs and leverages the transformer to propagate global contexts among patches. Moreover, in order to improve feature representation, we introduce online knowledge distillation learning strategy (KDL). The strategy achieves better interactions between global features and local features. Specifically, we design hard target and soft target by simply yet effectively ensembling outputs of two branches, which are used to supervise CNN and transformer branches. The experiments demonstrate the superiority of our proposed architecture and achieve competitive results with state-of-the-art methods.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?