Abstract:Recently, transformer-based backbones show superior performance over the convolutional counterparts in computer vision. Due to quadratic complexity with respect to the token number in global attention, local attention is always adopted in low-level image processing with linear complexity. However, the limited receptive field is harmful to the performance. In this paper, motivated by Octave convolution, we propose a transformer-based single image super-resolution (SISR) model, which explicitly embeds dynamic frequency decomposition into the standard local transformer. All the frequency components are continuously updated and re-assigned via intra-scale attention and inter-scale interaction, respectively. Specifically, the attention in low resolution is enough for low-frequency features, which not only increases the receptive field, but also decreases the complexity. Compared with the standard local transformer, the proposed FDRTran layer simultaneously decreases FLOPs and parameters. By contrast, Octave convolution only decreases FLOPs of the standard convolution, but keeps the parameter number unchanged. In addition, the restart mechanism is proposed for every a few frequency updates, which first fuses the low and high frequency, then decomposes the features again. In this way, the features can be decomposed in multiple viewpoints by learnable parameters, which avoids the risk of early saturation for frequency representation. Furthermore, based on the FDRTran layer with restart mechanism, the proposed FDRNet is the first transformer backbone for SISR which discusses the Octave design. Sufficient experiments show our model reaches state-of-the-art performance on 6 synthetic and real datasets. The code and the models are available at https://github.com/catnip1029/FDRNet.

Transformer-based image super-resolution and its lightweight

Lightweight Image Super-Resolution Network Using 3D Convolutional Neural Networks

NoUCSR: Efficient Super-Resolution Network Without Upsampling Convolution.

Lightweight Multi-Attention Fusion Network for Image Super-Resolution

A Residual Network with Efficient Transformer for Lightweight Image Super-Resolution

Efficient Transformer for Single Image Super-Resolution.

Transforming Image Super-Resolution: A ConvFormer-Based Efficient Approach

Incorporating Transformer Designs into Convolutions for Lightweight Image Super-Resolution

Cross-receptive Focused Inference Network for Lightweight Image Super-Resolution

Lightweight single-image super-resolution network based on dual paths

Lightweight Image Super-Resolution with Pyramid Clustering Transformer

Lightweight image super-resolution based on stepwise feedback mechanism and multi-feature maps fusion

Lightweight Single Image Super-Resolution via Efficient Mixture of Transformers and Convolutional Networks

A very lightweight and efficient image super-resolution network

Image Super-resolution via Efficient Transformer Embedding Frequency Decomposition with Restart

An Efficient Hybrid CNN-Transformer Approach for Remote Sensing Super-Resolution

DRCT: Saving Image Super-resolution away from Information Bottleneck

HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

Super-Resolution Algorithm Based on Transformer+CNN

Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution