Implicit Grid Convolution for Multi-Scale Image Super-Resolution

Dongheon Lee,Seokju Yun,Youngmin Ro
2024-08-19
Abstract:Recently, Super-Resolution (SR) achieved significant performance improvement by employing neural networks. Most SR methods conventionally train a single model for each targeted scale, which increases redundancy in training and deployment in proportion to the number of scales targeted. This paper challenges this conventional fixed-scale approach. Our preliminary analysis reveals that, surprisingly, encoders trained at different scales extract similar features from images. Furthermore, the commonly used scale-specific upsampler, Sub-Pixel Convolution (SPConv), exhibits significant inter-scale correlations. Based on these observations, we propose a framework for training multiple integer scales simultaneously with a single model. We use a single encoder to extract features and introduce a novel upsampler, Implicit Grid Convolution~(IGConv), which integrates SPConv at all scales within a single module to predict multiple scales. Our extensive experiments demonstrate that training multiple scales with a single model reduces the training budget and stored parameters by one-third while achieving equivalent inference latency and comparable performance. Furthermore, we propose IGConv$^{+}$, which addresses spectral bias and input-independent upsampling and uses ensemble prediction to improve performance. As a result, SRFormer-IGConv$^{+}$ achieves a remarkable 0.25dB improvement in PSNR at Urban100$\times$4 while reducing the training budget, stored parameters, and inference cost compared to the existing SRFormer.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the redundancy problem in traditional super - resolution (SR) methods. Specifically, traditional SR methods usually train a separate model for each target scale, which leads to a great waste of resources during the training and deployment processes. For example, in order to support multiple scales (such as ×2, ×3 and ×4), multiple models need to be trained and stored separately, which significantly increases the training cost and storage requirements. #### Main problems: 1. **Redundancy in fixed - scale training**: Most SR methods use fixed - scale training, that is, training a specific model for each target scale. This method not only increases the training budget and the number of stored parameters, but also makes it more difficult to find the optimal hyper - parameters. 2. **Cross - scale feature similarity not fully utilized**: Although the features extracted by models at different scales are very similar, traditional methods do not fully utilize this point to simplify the model design and training process. 3. **Limitations of existing up - samplers**: The commonly used Sub - Pixel Convolution (SPConv) up - sampler has a significant correlation between different scales, but existing methods fail to effectively integrate these correlations to improve efficiency and performance. #### New methods proposed in the paper: To solve the above problems, the author proposes the following innovations: 1. **Multi - scale training framework**: By introducing Implicit Grid Convolution (IGConv), the author proposes a method that can train multiple integer scales simultaneously in a single model. This method reduces the training budget and the number of stored parameters while maintaining or improving performance. 2. **New up - sampler IGConv**: IGConv integrates the functions of SPConv at all scales into a unified module, and uses cross - scale correlations to predict the outputs of multiple scales. In addition, the author also proposes an improved version IGConv +, which further improves performance by introducing techniques such as frequency loss, implicit grid sampling (IGSample) and feature - level geometric re - parameterization (FGRep). 3. **Reduction of computational cost**: Compared with traditional SPConv and SPConv +, IGConv and IGConv + not only improve performance, but also show higher efficiency in terms of inference time and memory usage. Through these improvements, the author demonstrates the effectiveness and superiority of the multi - scale training framework and proves its wide application potential on a variety of datasets and models.