Efficient Super Resolution For Large-Scale Images Using Attentional GAN

Harsh Nilesh Pathak,Xinxin Li,Shervin Minaee,Brooke Cowan
DOI: https://doi.org/10.48550/arXiv.1812.04821
2019-01-13
Abstract:Single Image Super Resolution (SISR) is a well-researched problem with broad commercial relevance. However, most of the SISR literature focuses on small-size images under 500px, whereas business needs can mandate the generation of very high resolution images. At Expedia Group, we were tasked with generating images of at least 2000px for display on the website, four times greater than the sizes typically reported in the literature. This requirement poses a challenge that state-of-the-art models, validated on small images, have not been proven to handle. In this paper, we investigate solutions to the problem of generating high-quality images for large-scale super resolution in a commercial setting. We find that training a generative adversarial network (GAN) with attention from scratch using a large-scale lodging image data set generates images with high PSNR and SSIM scores. We describe a novel attentional SISR model for large-scale images, A-SRGAN, that uses a Flexible Self Attention layer to enable processing of large-scale images. We also describe a distributed algorithm which speeds up training by around a factor of five.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to generate high - quality large - scale single - image super - resolution (SISR) images in a commercial environment. Specifically, the paper focuses on how to develop techniques capable of handling high - resolution images of at least 2,000 pixels when existing techniques are mainly for small - sized images (usually less than 500 pixels). This requirement is particularly important for enterprises like Expedia Group, because they need to display high - resolution pictures on their websites, and the effectiveness of existing super - resolution models (such as SRGAN, SRCNN, and ESPCNN) in large - scale resolution spaces has not been verified. Therefore, the goal of the paper is to explore and develop models that can efficiently handle large - scale images during training and testing and achieve high standards in visual quality (PSNR > 25 and SSIM > 0.75). To achieve this goal, the authors conducted research in the following aspects: 1. **Applying pre - trained models**: First, they attempted to directly use pre - trained SR models to process Expedia Group's accommodation image datasets, but found that these models could not meet the set PSNR and SSIM target values. 2. **Fine - tuning pre - trained models**: Then, they improved the performance by fine - tuning the weights of pre - trained models. Although there was an improvement in PSNR and SSIM, there were still unwanted artifacts in the images, such as ringing effects and blurring at the edges. 3. **End - to - end training**: Finally, the authors trained two models from scratch - SRGAN and a novel attention - mechanism super - resolution generative adversarial network (A - SRGAN). A - SRGAN enhances the ability to capture long - distance dependencies in large - scale images by introducing a flexible self - attention layer (FSA), thereby improving image quality and object consistency. 4. **Distributed training algorithm**: To improve training efficiency, the authors also developed a distributed algorithm that uses multiple GPUs to accelerate the training process, reducing the training time by approximately five times. Through these methods, the paper not only verifies the performance of existing SR models on large - scale images but also proposes a new model and training strategy to better meet the high - resolution image requirements in commercial applications.