Image super-resolution: prefix-tuning transformer from large to small datasets

Hui Ma,Dongli Jia,Jiejie Xiao,Xu Su
DOI: https://doi.org/10.1007/s11760-023-02946-9
IF: 1.583
2024-01-07
Signal Image and Video Processing
Abstract:Pre-training methods can adapt the representation ability of upstream models to new sub-tasks in image super-resolution. Compared to starting new training from scratch, pre-training yields earlier but better convergence in evaluations. However, the current method used to adapt the pretraining model to sub-tasks is called full fine-tuning (FFT), in which all the parameters of the pre-training model are updated. This method still has relatively high computational and iterative costs. And each sub-task has to store a corresponding set of parameters, resulting in a tall storage burden. To address these issues with FFT, this paper proposes a novel super-resolution training method based on prefix and prompt fine-tuning (SRPPT). In SRPPT, a "few" prefix and prompt parameters are added to the self-attention module of an existing transformer-based super-resolution model. During training, the original model parameters are frozen, and only the newly added prefix or prompt is updated. Experiments show that when a model pre-trained on ImageNet is adapted to DF2K, the parameter update amount of SRPPT is less than 200 times that of FFT, and the number of iterations is reduced by half. Moreover, compared to FFT, SRPPT achieves equivalent or even better performance in terms of evaluation metrics such as peak signal-to-noise ratio and structural similarity. In addition, each sub-task only needs to save its own corresponding "few" parameters, and all sub-tasks share the same backbone model parameters.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?