Abstract:Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity. Recently, the Selective Structured State Space Model has exhibited significant potential for long-range dependency modeling with linear complexity, offering a promising avenue to address the aforementioned dilemma. In this paper, we propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba. Specifically, we devise an improved efficient Mamba model for image fusion, integrating efficient visual state space model with dynamic convolution and channel attention. This refined model not only upholds the performance of Mamba and global modeling capability but also diminishes channel redundancy while enhancing local enhancement capability. Additionally, we devise a dynamic feature fusion module (DFFM) comprising two dynamic feature enhancement modules (DFEM) and a cross modality fusion mamba module (CMFM). The former serves for dynamic texture enhancement and dynamic difference perception, whereas the latter enhances correlation features between modes and suppresses redundant intermodal information. FusionMamba has yielded state-of-the-art (SOTA) performance across various multimodal medical image fusion tasks (CT-MRI, PET-MRI, SPECT-MRI), infrared and visible image fusion task (IR-VIS) and multimodal biomedical image fusion dataset (GFP-PC), which is proved that our model has generalization ability. The code for FusionMamba is available at

Multimodal dynamic fusion framework: Multilevel feature fusion guided by prompts

Efficient Multimodal Fusion Via Interactive Prompting

Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation

Progressive Fusion for Multimodal Integration

Conditional Prompt Tuning for Multimodal Fusion

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

Provable Dynamic Fusion for Low-Quality Multimodal Data

CMCI: A Robust Multimodal Fusion Method for Spiking Neural Networks

Dual Low-Rank Multimodal Fusion

Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis

Predictive Dynamic Fusion

Prompt Link Multimodal Fusion in Multimodal Sentiment Analysis

MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion

MMTM: Multimodal Transfer Module for CNN Fusion

Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation

Multi-focus image fusion with parameter adaptive dual channel dynamic threshold neural P systems

MEFusion: Unsupervised Mutual Enhancement for Multimodal Image Fusion

Parameter-efficient Tuning of Large-scale Multimodal Foundation Model

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

Dense Multimodal Fusion for Hierarchically Joint Representation