ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

Alik Pramanick,Utsav Bheda,Arijit Sur
2024-08-19
Abstract:Recently, transformers have captured significant interest in the area of single-image super-resolution tasks, demonstrating substantial gains in performance. Current models heavily depend on the network's extensive ability to extract high-level semantic details from images while overlooking the effective utilization of multi-scale image details and intermediate information within the network. Furthermore, it has been observed that high-frequency areas in images present significant complexity for super-resolution compared to low-frequency areas. This work proposes a transformer-based super-resolution architecture called ML-CrAIST that addresses this gap by utilizing low-high frequency information in multiple scales. Unlike most of the previous work (either spatial or channel), we operate spatial and channel self-attention, which concurrently model pixel interaction from both spatial and channel dimensions, exploiting the inherent correlations across spatial and channel axis. Further, we devise a cross-attention block for super-resolution, which explores the correlations between low and high-frequency information. Quantitative and qualitative assessments indicate that our proposed ML-CrAIST surpasses state-of-the-art super-resolution methods (e.g., 0.15 dB gain @Manga109 $\times$4). Code is available on: <a class="link-external link-https" href="https://github.com/Alik033/ML-CrAIST" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems in the single - image super - resolution (SISR) task: 1. **Effective utilization of multi - scale information**: - Most of the existing super - resolution models rely on the network's ability to extract high - level semantic details, but ignore the effective utilization of multi - scale image details and intermediate information within the network. To make up for this deficiency, this paper proposes a new architecture that can utilize low - frequency and high - frequency information at multiple scales. 2. **Complexity of high - frequency regions**: - High - frequency regions are more complex than low - frequency regions in an image, which poses a greater challenge to the super - resolution task. Therefore, this paper pays special attention to how to better handle the information in these high - frequency regions. 3. **Combination of spatial - domain and frequency - domain information**: - Many existing methods mainly focus on working in the spatial domain and ignore the potential advantages of the frequency domain. The frequency domain can provide better methods to recover lost high - frequency information. For this purpose, this paper designs a cross - attention block (CAB) to fuse low - frequency and high - frequency information, thereby improving super - resolution performance. 4. **Multi - scale perception framework**: - There may be repeated texture patterns in a single image (such as the facades of buildings, windows, etc.), which appear at different scales in different positions. Therefore, this paper introduces a multi - scale perception framework that can aggregate information from all different scales of the low - resolution image in order to better capture non - local details. ### Main contributions 1. **Multi - scale model**: - A novel multi - scale model is proposed, which utilizes both spatial - domain and frequency - domain features to enhance the spatial resolution of low - resolution images. 2. **Low - and - high - frequency interaction block (LHFIB)**: - An LHFIB is introduced, which exchanges information between low - frequency and high - frequency sub - bands through the proposed cross - attention block (CAB). 3. **Non - linear fusion method**: - A non - linear method based on the attention mechanism is proposed to recover high - frequency details more accurately. 4. **Multi - scale information fusion**: - Use CAB to obtain information features from different scales while retaining high - resolution features to accurately represent spatial details. Through these improvements, the ML - CrAIST model proposed in this paper outperforms the existing state - of - the - art super - resolution methods in both quantitative and qualitative evaluations on multiple standard datasets, especially achieving a significant performance improvement on the Manga109 dataset.