MambaIRv2: Attentive State Space Restoration

Hang Guo,Yong Guo,Yaohua Zha,Yulun Zhang,Wenbo Li,Tao Dai,Shu-Tao Xia,Yawei Li
2024-11-22
Abstract:The Mamba-based image restoration backbones have recently demonstrated significant potential in balancing global reception and computational efficiency. However, the inherent causal modeling limitation of Mamba, where each token depends solely on its predecessors in the scanned sequence, restricts the full utilization of pixels across the image and thus presents new challenges in image restoration. In this work, we propose MambaIRv2, which equips Mamba with the non-causal modeling ability similar to ViTs to reach the attentive state space restoration model. Specifically, the proposed attentive state-space equation allows to attend beyond the scanned sequence and facilitate image unfolding with just one single scan. Moreover, we further introduce a semantic-guided neighboring mechanism to encourage interaction between distant but similar pixels. Extensive experiments show our MambaIRv2 outperforms SRFormer by \textbf{even 0.35dB} PSNR for lightweight SR even with \textbf{9.3\% less} parameters and suppresses HAT on classic SR by \textbf{up to 0.29dB}. Code is available at \url{<a class="link-external link-https" href="https://github.com/csguoh/MambaIR" rel="external noopener nofollow">this https URL</a>}.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is a series of challenges faced by existing Mamba - based methods in image restoration tasks due to their inherent causal modeling characteristics. Specifically, these challenges include: 1. **Limited global perception**: In existing Mamba methods, each pixel can only rely on the information of its preceding pixels and cannot fully utilize the useful pixels in the entire image. 2. **High computational cost of multi - direction scanning**: In order to alleviate information loss, existing methods usually adopt multi - direction scanning, but this will significantly increase the computational complexity, especially for high - resolution inputs. 3. **Weak interaction between long - distance pixels**: Due to the causal nature of Mamba, the interaction between long - distance pixels will gradually weaken, resulting in the inability to effectively utilize relevant pixels that have been scanned but are far away. To solve these problems, the paper proposes MambaIRv2. By introducing non - causal modeling capabilities, Mamba can better handle image restoration tasks. Specific improvements include: 1. **Attention State - Space Equation (ASE)**: By adding global cues to the output matrix \(C\) of the state - space equation, the model is allowed to query relevant pixels in the unscanned sequence, thereby breaking through the causal limitations. 2. **Semantic - Guided Neighborhood mechanism (SGN)**: By redefining the neighborhood relationship of pixels, pixels with similar semantics are made closer in the unfolded sequence, thereby enhancing the interaction between long - distance pixels. These improvements not only improve the performance of the model but also significantly improve the computational efficiency. Experimental results show that MambaIRv2 is 0.35dB higher in PSNR than the existing state - of - the - art method SRFormer in lightweight super - resolution tasks, and has 9.3% fewer parameters. In classical super - resolution tasks, MambaIRv2 is also 0.29dB higher in PSNR than HAT.