Abstract:Recently, transformers have captured significant interest in the area of single-image super-resolution tasks, demonstrating substantial gains in performance. Current models heavily depend on the network's extensive ability to extract high-level semantic details from images while overlooking the effective utilization of multi-scale image details and intermediate information within the network. Furthermore, it has been observed that high-frequency areas in images present significant complexity for super-resolution compared to low-frequency areas. This work proposes a transformer-based super-resolution architecture called ML-CrAIST that addresses this gap by utilizing low-high frequency information in multiple scales. Unlike most of the previous work (either spatial or channel), we operate spatial and channel self-attention, which concurrently model pixel interaction from both spatial and channel dimensions, exploiting the inherent correlations across spatial and channel axis. Further, we devise a cross-attention block for super-resolution, which explores the correlations between low and high-frequency information. Quantitative and qualitative assessments indicate that our proposed ML-CrAIST surpasses state-of-the-art super-resolution methods (e.g., 0.15 dB gain @Manga109 $\times$4). Code is available on: <a class="link-external link-https" href="https://github.com/Alik033/ML-CrAIST" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key problems in the single - image super - resolution (SISR) task: 1. **Effective utilization of multi - scale information**: - Most of the existing super - resolution models rely on the network's ability to extract high - level semantic details, but ignore the effective utilization of multi - scale image details and intermediate information within the network. To make up for this deficiency, this paper proposes a new architecture that can utilize low - frequency and high - frequency information at multiple scales. 2. **Complexity of high - frequency regions**: - High - frequency regions are more complex than low - frequency regions in an image, which poses a greater challenge to the super - resolution task. Therefore, this paper pays special attention to how to better handle the information in these high - frequency regions. 3. **Combination of spatial - domain and frequency - domain information**: - Many existing methods mainly focus on working in the spatial domain and ignore the potential advantages of the frequency domain. The frequency domain can provide better methods to recover lost high - frequency information. For this purpose, this paper designs a cross - attention block (CAB) to fuse low - frequency and high - frequency information, thereby improving super - resolution performance. 4. **Multi - scale perception framework**: - There may be repeated texture patterns in a single image (such as the facades of buildings, windows, etc.), which appear at different scales in different positions. Therefore, this paper introduces a multi - scale perception framework that can aggregate information from all different scales of the low - resolution image in order to better capture non - local details. ### Main contributions 1. **Multi - scale model**: - A novel multi - scale model is proposed, which utilizes both spatial - domain and frequency - domain features to enhance the spatial resolution of low - resolution images. 2. **Low - and - high - frequency interaction block (LHFIB)**: - An LHFIB is introduced, which exchanges information between low - frequency and high - frequency sub - bands through the proposed cross - attention block (CAB). 3. **Non - linear fusion method**: - A non - linear method based on the attention mechanism is proposed to recover high - frequency details more accurately. 4. **Multi - scale information fusion**: - Use CAB to obtain information features from different scales while retaining high - resolution features to accurately represent spatial details. Through these improvements, the ML - CrAIST model proposed in this paper outperforms the existing state - of - the - art super - resolution methods in both quantitative and qualitative evaluations on multiple standard datasets, especially achieving a significant performance improvement on the Manga109 dataset.

ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

Rethinking Multi-Contrast MRI Super-Resolution: Rectangle-Window Cross-Attention Transformer and Arbitrary-Scale Upsampling

Multi-Scale Cross-Attention Fusion Network Based on Image Super-Resolution

Lightweight Multi-Attention Fusion Network for Image Super-Resolution

Cross-Modality High-Frequency Transformer for MR Image Super-Resolution

Multi-attention fusion transformer for single-image super-resolution

Cross-receptive Focused Inference Network for Lightweight Image Super-Resolution

HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution

MadFormer: multi-attention-driven image super-resolution method based on Transformer

Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution

Multi-scale Attention Network for Single Image Super-Resolution

Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

MAT: Multi-Range Attention Transformer for Efficient Image Super-Resolution

Enhanced Window-Based Self-Attention with Global and Multi-Scale Representations for Remote Sensing Image Super-Resolution

Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach

Efficient Mixed Transformer for Single Image Super-Resolution

Attention-guided hybrid transformer-convolutional neural network for underwater image super-resolution

Remote Sensing Image Super-Resolution Using Enriched Spatial-Channel Feature Aggregation Networks

DRCT: Saving Image Super-resolution away from Information Bottleneck

Multi-scale attention network for image super-resolution