AIM 2020 Challenge on Video Extreme Super-Resolution: Methods and Results

Dario Fuoli,Zhiwu Huang,Shuhang Gu,Radu Timofte,Arnau Raventos,Aryan Esfandiari,Salah Karout,Xuan Xu,Xin Li,Xin Xiong,Jinge Wang,Pablo Navarrete Michelini,Wenhao Zhang,Dongyang Zhang,Hanwei Zhu,Dan Xia,Haoyu Chen,Jinjin Gu,Zhi Zhang,Tongtong Zhao,Shanshan Zhao,Kazutoshi Akita,Norimichi Ukita,Hrishikesh P S,Densen Puthussery,Jiji C V
DOI: https://doi.org/10.48550/arXiv.2009.06290
2020-09-14
Abstract:This paper reviews the video extreme super-resolution challenge associated with the AIM 2020 workshop at ECCV 2020. Common scaling factors for learned video super-resolution (VSR) do not go beyond factor 4. Missing information can be restored well in this region, especially in HR videos, where the high-frequency content mostly consists of texture details. The task in this challenge is to upscale videos with an extreme factor of 16, which results in more serious degradations that also affect the structural integrity of the videos. A single pixel in the low-resolution (LR) domain corresponds to 256 pixels in the high-resolution (HR) domain. Due to this massive information loss, it is hard to accurately restore the missing information. Track 1 is set up to gauge the state-of-the-art for such a demanding task, where fidelity to the ground truth is measured by PSNR and SSIM. Perceptually higher quality can be achieved in trade-off for fidelity by generating plausible high-frequency content. Track 2 therefore aims at generating visually pleasing results, which are ranked according to human perception, evaluated by a user study. In contrast to single image super-resolution (SISR), VSR can benefit from additional information in the temporal domain. However, this also imposes an additional requirement, as the generated frames need to be consistent along time.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?