An efficient parallel fusion structure of distilled and transformer-enhanced modules for lightweight image super-resolution

Guanqiang Wang,Mingsong Chen,Yongcheng Lin,Xianhua Tan,Chizhou Zhang,Wenxin Yao,Baihui Gao,Weidong Zeng
DOI: https://doi.org/10.1007/s00371-023-03243-9
IF: 2.835
2024-01-23
The Visual Computer
Abstract:Although the convolution neural network (CNN) and transformer methods have greatly promoted the development of image super-resolution, these two methods have their disadvantages. Making a trade-off between the two methods and effectively integrating their advantages can restore high-frequency information of images with fewer parameters and higher quality. Hence, in this study, a novel dual parallel fusion structure of distilled feature pyramid and serial CNN and transformer (PFDFCT) model is proposed. In one branch, a lightweight serial structure of CNN and transformer is implemented to guarantee the richness of the global features extracted by transformer. In the other branch, an efficient distillation feature pyramid hybrid attention module is designed to efficiently purify the local features extracted by CNN and maintain integrity through cross-fusion. Such a multi-path parallel fusion strategy can ensure the richness and accuracy of features while avoiding the use of complex and long-range structures. The results show that the PFDFCT can reduce the mis-generated stripes and make the reconstructed image clearer for both easy-to-reconstruct and difficult-to-reconstruct targets compared to other advanced methods. Additionally, PFDFCT achieves a remarkable advance in model size and computational cost. Compared to the state-of-the-art (SOTA) model (i.e., efficient long-range attention network) in 2022, PFDFCT reduces parameters and floating point of operations (FLOPs) by more than 20% and 26% under all three scales, while maintaining a similar advanced reconstruction ability. The FLOPs of PFDFCT are as low as 31.8G, 55.3G, and 122.5G under scales of 2, 3 and 4, which are much lower than most current SOTA methods.
computer science, software engineering
What problem does this paper attempt to address?