Abstract:Stereo Image Super-Resolution (stereoSR) has attracted significant attention in recent years due to the extensive deployment of dual cameras in mobile phones, autonomous vehicles and robots. In this work, we propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration, and the frequency domain knowledge obtained by the Fast Fourier Convolution (FFC). Specifically, to effectively gather global information, we modify the Residual Swin Transformer blocks (RSTBs) in SwinIR by explicitly incorporating the frequency domain knowledge using the FFC and employing the resulting residual Swin Fourier Transformer blocks (RSFTBs) for feature extraction. Besides, for the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM, which achieves highly competitive performance while requiring less computational cost than the state-of-the-art cross-attention modules. Extensive experimental results and ablation studies demonstrate the effectiveness and efficiency of our proposed SwinFSR.

What problem does this paper attempt to address?

The paper primarily addresses the issue of stereo image super-resolution (stereoSR) and proposes a new solution. Specifically, the authors designed a method called SwinFSR, which combines SwinIR (a Transformer structure originally used for single image restoration) with frequency domain knowledge to enhance the resolution of stereo images. The main contributions of SwinFSR can be summarized as follows: 1. **Proposal of the new method SwinFSR**: Based on a systematic analysis of existing methods, the authors identified some common issues, such as the lack of specific mechanisms in existing SwinIR-based models for utilizing dual-view features and the failure to fully leverage global information. To address these problems, they proposed SwinFSR, which inherits the advantages of SwinIR and explicitly integrates Fast Fourier Convolution (FFC) to utilize spatial and spectral features. 2. **Design of the RCAM module**: To effectively exchange information between dual views, the authors also proposed a new cross-attention module—RCAM (Residual Cross Attention Module). Compared to existing cross-attention modules (such as SAM, SCAM, and biPAM), RCAM not only balances efficient inference and accurate learning but also improves inference speed without significantly affecting performance. 3. **Extensive experimental validation**: The authors conducted numerous experiments, including different training and testing strategies, to demonstrate the effectiveness and efficiency of the proposed SwinFSR. Experimental results show that SwinFSR achieved very competitive performance across multiple datasets. In summary, this research aims to improve the effectiveness of stereo image super-resolution tasks by introducing a new network architecture and improved attention mechanisms, particularly showing advantages in handling dual-view information fusion.

SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge

CSwT-SR: Conv-Swin Transformer for Blind Remote Sensing Image Super-Resolution with Amplitude-Phase Learning and Structural Detail Alternating Learning

SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution

SwinDPSR: Dual-Path Face Super-Resolution Network Integrating Swin Transformer

FCSwinU: Fourier Convolutions and Swin Transformer UNet for Hyperspectral and Multispectral Image Fusion

A Swin Transformer-Based Fusion Approach for Hyperspectral Image Super-Resolution

SwinFuSR: an image fusion-inspired model for RGB-guided thermal image super-resolution

SwinIBSR: Toward real-world infrared image super-resolution

Learning Accurate and Enriched Features for Stereo Image Super-Resolution

Resolution enhancement processing on low quality images using swin transformer based on interval dense connection strategy

Boosting Image Super-Resolution Via Fusion of Complementary Information Captured by Multi-Modal Sensors

SRBPSwin: Single-Image Super-Resolution for Remote Sensing Images Using a Global Residual Multi-Attention Hybrid Back-Projection Network Based on the Swin Transformer

Revolutionizing Space Health (Swin-FSR): Advancing Super-Resolution of Fundus Images for SANS Visual Assessment Technology

FE-FAIR: Feature-Enhanced Fused Attention for Image Super-Resolution

Residual SwinV2 transformer coordinate attention network for image super resolution

SwinFG: A fine-grained recognition scheme based on swin transformer

Dual Self-Attention Swin Transformer for Hyperspectral Image Super-Resolution

Infrared Image Super-Resolution Network Utilizing the Enhanced Transformer and U-Net

A novel image restoration solution for cross-resolution person re-identification

PFT-SSR: Parallax Fusion Transformer for Stereo Image Super-Resolution

SwinIR: Image Restoration Using Swin Transformer