Abstract:Owing to the limitations of imaging sensors, it is challenging to obtain a medical image that simultaneously contains functional metabolic information and structural tissue details. Multimodal medical image fusion, an effective way to merge the complementary information in different modalities, has become a significant technique to facilitate clinical diagnosis and surgical navigation. With powerful feature representation ability, deep learning (DL)-based methods have improved such fusion results but still have not achieved satisfactory performance. Specifically, existing DL-based methods generally depend on convolutional operations, which can well extract local patterns but have limited capability in preserving global context information. To compensate for this defect and achieve accurate fusion, we propose a novel unsupervised method to fuse multimodal medical images via a multiscale adaptive Transformer termed MATR. In the proposed method, instead of directly employing vanilla convolution, we introduce an adaptive convolution for adaptively modulating the convolutional kernel based on the global complementary context. To further model long-range dependencies, an adaptive Transformer is employed to enhance the global semantic extraction capability. Our network architecture is designed in a multiscale fashion so that useful multimodal information can be adequately acquired from the perspective of different scales. Moreover, an objective function composed of a structural loss and a region mutual information loss is devised to construct constraints for information preservation at both the structural-level and the feature-level. Extensive experiments on a mainstream database demonstrate that the proposed method outperforms other representative and state-of-the-art methods in terms of both visual quality and quantitative evaluation. We also extend the proposed method to address other biomedical image fusion issues, and the pleasing fusion results illustrate that MATR has good generalization capability. The code of the proposed method is available at https://github.com/tthinking/MATR.

MdcFormer: Transformers Based on Dynamic Weights and Multi-Scale for Medical Image Segmentation

Mmformer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation

ScaleFormer: Revisiting the Transformer-based Backbones from a Scale-wise Perspective for Medical Image Segmentation.

MixFormer: a Mixed CNN-Transformer Backbone for Medical Image Segmentation

MDC-RHT: Multi-Modal Medical Image Fusion via Multi-Dimensional Dynamic Convolution and Residual Hybrid Transformer

A Dynamic Cross-Scale Transformer with Dual-Compound Representation for 3D Medical Image Segmentation

A Lightweight Multi-Scale Multi-Angle Dynamic Interactive Transformer-CNN Fusion Model for 3D Medical Image Segmentation

Multi-dimension Transformer with Attention-based Filtering for Medical Image Segmentation

NestedFormer: Nested Modality-Aware Transformer for Brain Tumor Segmentation

MMMViT: Multiscale multimodal vision transformer for brain tumor segmentation with missing modalities

MOSformer: Momentum encoder-based inter-slice fusion transformer for medical image segmentation

Coformer: Collaborative Transformer for Medical Image Segmentation

HMDA: A Hybrid Model with Multi-scale Deformable Attention for Medical Image Segmentation

MATR: Multimodal Medical Image Fusion Via Multiscale Adaptive Transformer.

MAXFormer: Enhanced Transformer for Medical Image Segmentation with Multi-Attention and Multi-Scale Features Fusion

DTMFormer: Dynamic Token Merging for Boosting Transformer-Based Medical Image Segmentation

MACTFusion: Lightweight Cross Transformer for Adaptive Multimodal Medical Image Fusion

Hybrid-Fusion Transformer for Multisequence MRI

M2FTrans: Modality-Masked Fusion Transformer for Incomplete Multi-Modality BrainT Umor Segmentation

DS-Former: A Dual-Stream Encoding-Based Transformer for 3D Medical Image Segmentation

GCFormer: Multi-scale Feature Plays a Crucial Role in Medical Images Segmentation