A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion

Zihan Cao,Xiao Wu,Liang-Jian Deng,Yu Zhong

2024-08-22

Abstract:In image fusion tasks, images from different sources possess distinct characteristics. This has driven the development of numerous methods to explore better ways of fusing them while preserving their respective characteristics.Mamba, as a state space model, has emerged in the field of natural language processing. Recently, many studies have attempted to extend Mamba to vision tasks. However, due to the nature of images different from causal language sequences, the limited state capacity of Mamba weakens its ability to model image information. Additionally, the sequence modeling ability of Mamba is only capable of spatial information and cannot effectively capture the rich spectral information in images. Motivated by these challenges, we customize and improve the vision Mamba network designed for the image fusion task. Specifically, we propose the local-enhanced vision Mamba block, dubbed as LEVM. The LEVM block can improve local information perception of the network and simultaneously learn local and global spatial information. Furthermore, we propose the state sharing technique to enhance spatial details and integrate spatial and spectral information. Finally, the overall network is a multi-scale structure based on vision Mamba, called LE-Mamba. Extensive experiments show the proposed methods achieve state-of-the-art results on multispectral pansharpening and multispectral and hyperspectral image fusion datasets, and demonstrate the effectiveness of the proposed approach. Codes can be accessed at \url{<a class="link-external link-https" href="https://github.com/294coder/Efficient-MIF" rel="external noopener nofollow">this https URL</a>}.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address several key issues in the task of image fusion: 1. **Local Information Perception Enhancement**: Existing visual Mamba models have limited capability in handling local information, especially in image fusion tasks that require simultaneous processing of both local and global spatial information. Therefore, the authors propose a Local Enhanced Visual Mamba (LEVM) block to improve local information perception. 2. **State Sharing Technique**: To tackle the problem of information loss in existing methods when processing high-resolution images, and the inability to effectively capture spatial and spectral information of images, the authors designed a state sharing technique to reduce information loss and achieve simultaneous learning of spatial and spectral information. With these improvements, the LE-Mamba network is able to achieve state-of-the-art performance in tasks such as multispectral sharpening and multispectral and hyperspectral image fusion. Specifically, LE-Mamba is based on the U-Net architecture, introducing LEVM blocks and state sharing techniques in the encoder-decoder structure, thereby achieving excellent results on multiple benchmark datasets.

A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion

FusionMamba: Efficient Remote Sensing Image Fusion with State Space Model

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion

MSFMamba: Multi-Scale Feature Fusion State Space Model for Multi-Source Remote Sensing Image Classification

MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion

Multi-Modal Image Fusion Via Deep Laplacian Pyramid Hybrid Network

LocalMamba: Visual State Space Model with Windowed Selective Scan

Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model

MVF: A Novel Infrared and Visible Image Fusion Approach Based on the Morphing Convolutional Structure and the Light-Weight Visual State Space Block

O-Mamba: O-shape State-Space Model for Underwater Image Enhancement

Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion

WaterMamba: Visual State Space Model for Underwater Image Enhancement

MambaUIE&SR: Unraveling the Ocean's Secrets with Only 2.8 GFLOPs

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

Fusion-Mamba for Cross-modality Object Detection

PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

LeGFusion: Locally-enhanced Global Learning for Multi-Modal Image Fusion

Pan-Mamba: Effective pan-sharpening with State Space Model

LeGFusion: Locally Enhanced Global Learning for Multimodal Image Fusion