Abstract:Recently, window-based attention methods have shown great potential for computer vision tasks, particularly in Single Image Super-Resolution (SISR). However, it may fall short in capturing long-range dependencies and relationships between distant tokens. Additionally, we find that learning on spatial domain does not convey the frequency content of the image, which is a crucial aspect in SISR. To tackle these issues, we propose a new Channel-Partitioned Attention Transformer (CPAT) to better capture long-range dependencies by sequentially expanding windows along the height and width of feature maps. In addition, we propose a novel Spatial-Frequency Interaction Module (SFIM), which incorporates information from spatial and frequency domains to provide a more comprehensive information from feature maps. This includes information about the frequency content and enhances the receptive field across the entire image. Experimental findings show the effectiveness of our proposed modules and architecture. In particular, CPAT surpasses current state-of-the-art methods by up to 0.31dB at x2 SR on Urban100.

What problem does this paper attempt to address?

This paper attempts to solve two main problems in the Single Image Super - Resolution (SISR) task: 1. **Capturing long - distance dependencies**: The existing window - based attention mechanisms have shown great potential in the SISR task, but may be insufficient in capturing long - distance dependencies and the relationships between distant tokens. This limits the model's understanding and utilization of global information. 2. **Lack of frequency - domain information**: The existing SISR methods mainly extract features from the spatial domain and ignore the information in the frequency domain. However, frequency - domain information is crucial for HR image reconstruction because it contains the details and structural information of the image. To solve these problems, the authors propose two innovative modules: - **Channel - Partitioned Windowed Self - Attention (CPWin - SA)**: The attention mechanism is enhanced by expanding the window along the height and width of the input feature map, so as to better capture long - distance dependencies and the relationships between distant tokens. - **Spatial - Frequency Interaction Module (SFIM)**: The information in the spatial domain and the frequency domain is combined to make full use of the information in the feature map and improve the quality of the output image. These two modules work together, making the proposed model perform well in the SISR task and significantly outperform the existing state - of - the - art methods. Specifically, CPAT improves by 0.31 dB over HAT in the x2 super - resolution task on the Urban100 dataset. In summary, this paper solves the deficiencies of the existing SISR methods in capturing long - distance dependencies and using frequency - domain information by introducing a new attention mechanism and a method of fusing spatial and frequency - domain information, thereby significantly improving the quality of super - resolution images.

Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution

Lightweight Multi-Attention Fusion Network for Image Super-Resolution

Multi-Scale Cross-Attention Fusion Network Based on Image Super-Resolution

DCT-FANet: DCT based frequency attention network for single image super-resolution

Multi-Window Fusion Spatial-Frequency Joint Self-Attention for Remote-Sensing Image Super-Resolution

Enhanced Window-Based Self-Attention with Global and Multi-Scale Representations for Remote Sensing Image Super-Resolution

From Coarse to Fine: Hierarchical Pixel Integration for Lightweight Image Super-resolution

An Attention-Based Approach for Single Image Super Resolution.

Remote Sensing Image Super-Resolution Using Enriched Spatial-Channel Feature Aggregation Networks

A Lightweight Multi-Scale Channel Attention Network for Image Super-Resolution.

CFAT: Unleashing TriangularWindows for Image Super-resolution

Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution

Channel-Wise and Spatial Feature Modulation Network for Single Image Super-Resolution

Image Super-Resolution With Unified-Window Attention

High-frequency channel attention and contrastive learning for image super-resolution

Dual-path attention network for single image super-resolution

Efficient Image Super-Resolution via Symmetric Visual Attention Network

Enhanced local multi-windows attention network for lightweight image super-resolution

HAAT: Hybrid Attention Aggregation Transformer for Image Super-Resolution

HASN: Hybrid Attention Separable Network for Efficient Image Super-resolution

Dual contrastive attention-guided deformable convolutional network for single image super-resolution