MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion

Zhe Li,Haiwei Pan,Kejia Zhang,Yuhua Wang,Fengming Yu

2024-04-12

Abstract:Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image to represent the imaging scene and facilitate downstream visual tasks comprehensively. In recent years, significant progress has been made in MMIF tasks due to advances in deep neural networks. However, existing methods cannot effectively and efficiently extract modality-specific and modality-fused features constrained by the inherent local reductive bias (CNN) or quadratic computational complexity (Transformers). To overcome this issue, we propose a Mamba-based Dual-phase Fusion (MambaDFuse) model. Firstly, a dual-level feature extractor is designed to capture long-range features from single-modality images by extracting low and high-level features from CNN and Mamba blocks. Then, a dual-phase feature fusion module is proposed to obtain fusion features that combine complementary information from different modalities. It uses the channel exchange method for shallow fusion and the enhanced Multi-modal Mamba (M3) blocks for deep fusion. Finally, the fused image reconstruction module utilizes the inverse transformation of the feature extraction to generate the fused result. Through extensive experiments, our approach achieves promising fusion results in infrared-visible image fusion and medical image fusion. Additionally, in a unified benchmark, MambaDFuse has also demonstrated improved performance in downstream tasks such as object detection. Code with checkpoints will be available after the peer-review process.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper focuses on the problem of multimodal image fusion (MMIF), which integrates complementary information from different modal images to generate high-quality fusion images. Existing methods have limitations in efficiency and effectiveness in extracting modality-specific and fusion features, particularly the local restoration bias of convolutional neural networks (CNN) and the high computational complexity of Transformers. To address these issues, the paper proposes a Mamba-based dual-phase fusion model (MambaDFuse). MambaDFuse includes three stages: dual-level feature extraction, dual-phase feature fusion, and fusion image reconstruction. 1. Dual-level feature extraction: combining CNN and Mamba blocks to extract low-level and high-level features, using CNN to capture local features for early visual tasks, and Mamba to extract long-range features. 2. Dual-phase feature fusion: the shallow fusion module uses channel swapping to fuse global overview features, while the deep fusion module performs cross-modal depth feature fusion using the enhanced multimodal Mamba (M3) block to obtain local detailed features. 3. Fusion image reconstruction: using the inverse transformation of feature extraction to generate fusion results. Experiments show promising results of MambaDFuse in infrared-visible image fusion and medical image fusion tasks, as well as superior performance in downstream tasks such as object detection. Compared to existing methods, MambaDFuse achieves improvements in efficiency and effectiveness. Therefore, the main contribution of the paper lies in the first application of Mamba to the MMIF task, designing an effective feature extraction and fusion mechanism, providing a new solution for multimodal image fusion.

MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion

FusionMamba: Efficient Image Fusion with State Space Model

Fusion-Mamba for Cross-modality Object Detection

FusionMamba: Efficient Remote Sensing Image Fusion with State Space Model

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion

MSFMamba: Multi-Scale Feature Fusion State Space Model for Multi-Source Remote Sensing Image Classification

A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion

Multi-Modal Image Fusion Via Deep Laplacian Pyramid Hybrid Network

MDC-RHT: Multi-Modal Medical Image Fusion via Multi-Dimensional Dynamic Convolution and Residual Hybrid Transformer

CIRF: Coupled Image Reconstruction and Fusion Strategy for Deep Learning Based Multi-Modal Image Fusion

BCMFIFuse: A Bilateral Cross-Modal Feature Interaction-Based Network for Infrared and Visible Image Fusion

MMR-Mamba: Multi-Modal MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion

MIFFuse: A Multi-Level Feature Fusion Network for Infrared and Visible Images

MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion

Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation

Multi-focus image fusion with parameter adaptive dual channel dynamic threshold neural P systems

DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution

DM-Fusion: Deep Model-Driven Network for Heterogeneous Image Fusion.