MDC-RHT: Multi-Modal Medical Image Fusion via Multi-Dimensional Dynamic Convolution and Residual Hybrid Transformer

Wenqing Wang,Ji He,Han Liu,Wei Yuan
DOI: https://doi.org/10.3390/s24134056
IF: 3.9
2024-06-22
Sensors
Abstract:The fusion of multi-modal medical images has great significance for comprehensive diagnosis and treatment. However, the large differences between the various modalities of medical images make multi-modal medical image fusion a great challenge. This paper proposes a novel multi-scale fusion network based on multi-dimensional dynamic convolution and residual hybrid transformer, which has better capability for feature extraction and context modeling and improves the fusion performance. Specifically, the proposed network exploits multi-dimensional dynamic convolution that introduces four attention mechanisms corresponding to four different dimensions of the convolutional kernel to extract more detailed information. Meanwhile, a residual hybrid transformer is designed, which activates more pixels to participate in the fusion process by channel attention, window attention, and overlapping cross attention, thereby strengthening the long-range dependence between different modes and enhancing the connection of global context information. A loss function, including perceptual loss and structural similarity loss, is designed, where the former enhances the visual reality and perceptual details of the fused image, and the latter enables the model to learn structural textures. The whole network adopts a multi-scale architecture and uses an unsupervised end-to-end method to realize multi-modal image fusion. Finally, our method is tested qualitatively and quantitatively on mainstream datasets. The fusion results indicate that our method achieves high scores in most quantitative indicators and satisfactory performance in visual qualitative analysis.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
The paper aims to address the problem of multimodal medical image fusion, particularly the fusion challenges between Magnetic Resonance Imaging (MRI) and functional imaging (such as Positron Emission Tomography PET or Single Photon Emission Computed Tomography SPECT). This type of fusion is of great significance for clinical diagnosis and treatment, but the significant differences between different modalities make the fusion process highly challenging. To address these issues, the authors propose a new multi-scale fusion network—MDC-RHT (Multi-Dimensional Dynamic Convolution and Residual Hybrid Transformer). This network combines Multi-Dimensional Dynamic Convolution (MDC) and Residual Hybrid Transformer (RHT). MDC improves feature extraction capabilities by introducing four different attention mechanisms, corresponding to the spatial dimensions of the convolution kernel, the number of input channels, the number of output channels, and the number of convolution kernels. RHT is designed to activate more pixels to participate in the fusion process and enhance long-range dependencies between different modalities through channel attention, window attention, and overlapping cross attention, thereby strengthening the connection of global contextual information. Additionally, the paper proposes a loss function that includes Perceptual Loss and Structural Similarity Loss. The former enhances the visual realism and perceptual details of the fused image, while the latter enables the model to learn the structural texture of the image. The entire network adopts a multi-scale architecture and is trained in an end-to-end unsupervised manner to achieve multimodal image fusion. Finally, the method was qualitatively and quantitatively tested on mainstream datasets, and the results showed high scores on most quantitative metrics and satisfactory performance in terms of visual quality. This indicates that the MDC-RHT network can improve local accuracy while maintaining global consistency of the image, thereby producing high-quality fusion results.