SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge

Ke Chen,Liangyan Li,Huan Liu,Yunzhe Li,Congling Tang,Jun Chen
2023-04-25
Abstract:Stereo Image Super-Resolution (stereoSR) has attracted significant attention in recent years due to the extensive deployment of dual cameras in mobile phones, autonomous vehicles and robots. In this work, we propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration, and the frequency domain knowledge obtained by the Fast Fourier Convolution (FFC). Specifically, to effectively gather global information, we modify the Residual Swin Transformer blocks (RSTBs) in SwinIR by explicitly incorporating the frequency domain knowledge using the FFC and employing the resulting residual Swin Fourier Transformer blocks (RSFTBs) for feature extraction. Besides, for the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM, which achieves highly competitive performance while requiring less computational cost than the state-of-the-art cross-attention modules. Extensive experimental results and ablation studies demonstrate the effectiveness and efficiency of our proposed SwinFSR.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the issue of stereo image super-resolution (stereoSR) and proposes a new solution. Specifically, the authors designed a method called SwinFSR, which combines SwinIR (a Transformer structure originally used for single image restoration) with frequency domain knowledge to enhance the resolution of stereo images. The main contributions of SwinFSR can be summarized as follows: 1. **Proposal of the new method SwinFSR**: Based on a systematic analysis of existing methods, the authors identified some common issues, such as the lack of specific mechanisms in existing SwinIR-based models for utilizing dual-view features and the failure to fully leverage global information. To address these problems, they proposed SwinFSR, which inherits the advantages of SwinIR and explicitly integrates Fast Fourier Convolution (FFC) to utilize spatial and spectral features. 2. **Design of the RCAM module**: To effectively exchange information between dual views, the authors also proposed a new cross-attention module—RCAM (Residual Cross Attention Module). Compared to existing cross-attention modules (such as SAM, SCAM, and biPAM), RCAM not only balances efficient inference and accurate learning but also improves inference speed without significantly affecting performance. 3. **Extensive experimental validation**: The authors conducted numerous experiments, including different training and testing strategies, to demonstrate the effectiveness and efficiency of the proposed SwinFSR. Experimental results show that SwinFSR achieved very competitive performance across multiple datasets. In summary, this research aims to improve the effectiveness of stereo image super-resolution tasks by introducing a new network architecture and improved attention mechanisms, particularly showing advantages in handling dual-view information fusion.