Abstract:Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. ViTs typically yield superior results in image restoration compared to CNNs due to their ability to capture long-range dependencies and input-dependent characteristics. However, the computational complexity of Transformer-based models grows quadratically with the image resolution, limiting their practical appeal in high-resolution image restoration tasks. In this paper, we propose a simple yet effective visual state space model (EVSSM) for image deblurring, leveraging the benefits of state space models (SSMs) to visual data. In contrast to existing methods that employ several fixed-direction scanning for feature extraction, which significantly increases the computational cost, we develop an efficient visual scan block that applies various geometric transformations before each SSM-based module, capturing useful non-local information and maintaining high efficiency. Extensive experimental results show that the proposed EVSSM performs favorably against state-of-the-art image deblurring methods on benchmark datasets and real-captured images.

What problem does this paper attempt to address?

The paper primarily addresses the challenges and issues present in the task of image deblurring by proposing a novel solution. Specifically, the study attempts to solve the following key problems: 1. **Limitations of existing methods**: Traditional methods, such as those based on Convolutional Neural Networks (CNNs), have limitations when dealing with image deblurring tasks. The convolution operation itself is spatially invariant and local, making it difficult for CNNs to capture the spatial variation characteristics of image content and the non-local information beneficial for deblurring. 2. **Trade-off between efficiency and performance**: Although Transformer architectures can capture global information through self-attention mechanisms and perform well in image restoration tasks, their computational complexity increases significantly with the resolution of the input image, which poses a limitation for high-resolution image processing. Additionally, some methods that reduce computational costs (e.g., local window methods, transposed attention, etc.) sacrifice the ability to model non-local or spatial information, thereby affecting the quality of the restored image. 3. **Need to explore non-local information**: Therefore, there is a need to develop an efficient method that can explore non-local information without significantly increasing computational costs to achieve high-quality deblurring performance. To address the above issues, the paper proposes a simple yet effective Efficient Visual State Space Model (EVSSM), which leverages the advantages of State Space Models (SSMs) to handle visual data. Specifically, EVSSM utilizes the capability of state space models to effectively capture long-range dependencies and employs an Efficient Visual Scan (EVS) strategy to capture non-local spatial information while maintaining low computational costs. Additionally, the paper introduces an Efficient Discriminative Frequency Domain-based Feedforward Network (EDFFN) module to further enhance the efficiency of feature transformation. Experimental results show that the proposed EVSSM method achieves competitive or even better performance compared to existing state-of-the-art methods on multiple benchmark datasets, especially in the quantitative and qualitative evaluations on datasets such as GoPro, HIDE, and RealBlur. These results demonstrate the effectiveness and efficiency of EVSSM in handling the image deblurring task.

Efficient Visual State Space Model for Image Deblurring

Vision Transformers for Single Image Dehazing

Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring

VmambaIR: Visual State Space Model for Image Restoration

Revisiting Image Deblurring with an Efficient ConvNet

Rethinking Image Deblurring Via CNN-Transformer Multiscale Hybrid Architecture

An Efficient Dehazing Algorithm Based on the Fusion of Transformer and Convolutional Neural Network.

A new visual State Space Model for low-dose CT denoising

Convformer: Dual-Stream Vision Transformers and Convolutional Networks for Image Restoration

Efficient Image Deblurring Networks based on Diffusion Models

Efficient Image Super-Resolution via Symmetric Visual Attention Network

Decoupling Image Deblurring Into Twofold: A Hierarchical Model for Defocus Deblurring

Multi-Scale Cyclic Image Deblurring Based on PVC-Resnet

A Hybrid Structural Sparse Error Model for Image Deblocking

WTransU-Net: Wiener deconvolution meets multi-scale transformer-based U-net for image deblurring

Scalable Visual State Space Model with Fractal Scanning

Deep self-supervised spatial-variant image deblurring

Image Deraining with Frequency-Enhanced State Space Model

VDTR: Video Deblurring with Transformer

VSSD: Vision Mamba with Non-Causal State Space Duality

A Concatenated Residual Network for Image Deblurring.