A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion

Zihan Cao,Xiao Wu,Liang-Jian Deng,Yu Zhong
2024-08-22
Abstract:In image fusion tasks, images from different sources possess distinct characteristics. This has driven the development of numerous methods to explore better ways of fusing them while preserving their respective characteristics.Mamba, as a state space model, has emerged in the field of natural language processing. Recently, many studies have attempted to extend Mamba to vision tasks. However, due to the nature of images different from causal language sequences, the limited state capacity of Mamba weakens its ability to model image information. Additionally, the sequence modeling ability of Mamba is only capable of spatial information and cannot effectively capture the rich spectral information in images. Motivated by these challenges, we customize and improve the vision Mamba network designed for the image fusion task. Specifically, we propose the local-enhanced vision Mamba block, dubbed as LEVM. The LEVM block can improve local information perception of the network and simultaneously learn local and global spatial information. Furthermore, we propose the state sharing technique to enhance spatial details and integrate spatial and spectral information. Finally, the overall network is a multi-scale structure based on vision Mamba, called LE-Mamba. Extensive experiments show the proposed methods achieve state-of-the-art results on multispectral pansharpening and multispectral and hyperspectral image fusion datasets, and demonstrate the effectiveness of the proposed approach. Codes can be accessed at \url{<a class="link-external link-https" href="https://github.com/294coder/Efficient-MIF" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address several key issues in the task of image fusion: 1. **Local Information Perception Enhancement**: Existing visual Mamba models have limited capability in handling local information, especially in image fusion tasks that require simultaneous processing of both local and global spatial information. Therefore, the authors propose a Local Enhanced Visual Mamba (LEVM) block to improve local information perception. 2. **State Sharing Technique**: To tackle the problem of information loss in existing methods when processing high-resolution images, and the inability to effectively capture spatial and spectral information of images, the authors designed a state sharing technique to reduce information loss and achieve simultaneous learning of spatial and spectral information. With these improvements, the LE-Mamba network is able to achieve state-of-the-art performance in tasks such as multispectral sharpening and multispectral and hyperspectral image fusion. Specifically, LE-Mamba is based on the U-Net architecture, introducing LEVM blocks and state sharing techniques in the encoder-decoder structure, thereby achieving excellent results on multiple benchmark datasets.