FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

Xinyu Xie,Yawen Cui,Chio-In Ieong,Tao Tan,Xiaozhi Zhang,Xubin Zheng,Zitong Yu
2024-04-21
Abstract:Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity. Recently, the Selective Structured State Space Model has exhibited significant potential for long-range dependency modeling with linear complexity, offering a promising avenue to address the aforementioned dilemma. In this paper, we propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba. Specifically, we devise an improved efficient Mamba model for image fusion, integrating efficient visual state space model with dynamic convolution and channel attention. This refined model not only upholds the performance of Mamba and global modeling capability but also diminishes channel redundancy while enhancing local enhancement capability. Additionally, we devise a dynamic feature fusion module (DFFM) comprising two dynamic feature enhancement modules (DFEM) and a cross modality fusion mamba module (CMFM). The former serves for dynamic texture enhancement and dynamic difference perception, whereas the latter enhances correlation features between modes and suppresses redundant intermodal information. FusionMamba has yielded state-of-the-art (SOTA) performance across various multimodal medical image fusion tasks (CT-MRI, PET-MRI, SPECT-MRI), infrared and visible image fusion task (IR-VIS) and multimodal biomedical image fusion dataset (GFP-PC), which is proved that our model has generalization ability. The code for FusionMamba is available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address key issues in multimodal image fusion, particularly in the field of medical image fusion. Specifically, existing methods based on Convolutional Neural Networks (CNN) and Transformer architectures have limitations in capturing global features and local details. The paper proposes a new dynamic feature enhancement model, FusionMamba, to improve these issues by leveraging the advantages of the Mamba model. ### Main Issues 1. **Limitations of Existing Methods**: - Methods based on CNN are limited in capturing global features because they rely on local convolution operations. - Methods based on Transformers, while good at global feature modeling, have high computational complexity and are not as effective as CNNs in capturing local details. 2. **Insufficient Feature Fusion**: - Current fusion methods fail to effectively extract features from different modalities, leading to decreased fusion performance. ### Solution The paper proposes the FusionMamba model, which aims to address the above issues through the following aspects: 1. **Dynamic Feature Enhancement Module (DFEM)**: This module can dynamically enhance texture detail information in source images and perceive differences between different modalities. 2. **Cross-Modal Fusion Mamba Module (CMFM)**: This module effectively mines relevant features between different modalities and suppresses redundant inter-modal information. 3. **Dynamic Visual State Space Module (DVSS)**: This module improves the standard Mamba model by enhancing local feature extraction capabilities and reducing channel redundancy. With these improvements, FusionMamba achieves better performance in various multimodal image fusion tasks, including infrared and visible image fusion, CT and MRI image fusion, PET and MRI image fusion, and biomedical image fusion. Experimental results show that FusionMamba outperforms existing state-of-the-art techniques across multiple evaluation metrics.