Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer

Quan Zhou,Shaozhuang Ye,Mingwei Wen,Zhiwen Huang,Mingyue Ding,Xuming Zhang
DOI: https://doi.org/10.1007/s00521-022-07635-1
2022-07-29
Neural Computing and Applications
Abstract:Multi-modal medical image fusion (MMIF) has found wide application in the field of disease diagnosis and surgical guidance. Despite the popularity of deep learning (DL)-based fusion methods, these DL algorithms cannot provide satisfactory fusion performance due to the difficulty in capturing the local information and the long-range dependencies effectively. To address these issues, this paper has presented an unsupervised MMIF method by combining a densely-connected high-resolution network (DHRNet) with a hybrid transformer. In this method, the local features are firstly extracted from the source image using the DHRNet. Then these features are input into the fine-grained attention module in the hybrid transformer to produce the global features by exploring their long-range dependencies. The local and global features are fused by the projection attention module in the hybrid transformer. Finally, based on the fused features, the fused result is reconstructed by the decoder network. The presented network is trained using an unsupervised loss function including edge preservation value, structural similarity, sum of the correlations of differences and structural tensor. Experiments on various multi-modal medical images show that, compared with several traditional and DL-based fusion methods, the presented method can generate visually better fused results and provide better quantitative metrics values.
computer science, artificial intelligence
What problem does this paper attempt to address?