SRGAN: Training Dataset Matters

Nao Takano,Gita Alaghband
DOI: https://doi.org/10.48550/arXiv.1903.09922
2019-03-24
Abstract:Generative Adversarial Networks (GANs) in supervised settings can generate photo-realistic corresponding output from low-definition input (SRGAN). Using the architecture presented in the SRGAN original paper [2], we explore how selecting a dataset affects the outcome by using three different datasets to see that SRGAN fundamentally learns objects, with their shape, color, and texture, and redraws them in the output rather than merely attempting to sharpen edges. This is further underscored with our demonstration that once the network learns the images of the dataset, it can generate a photo-like image with even a slight hint of what it might look like for the original from a very blurry edged sketch. Given a set of inference images, the network trained with the same dataset results in a better outcome over the one trained with arbitrary set of images, and we report its significance numerically with Frechet Inception Distance score [22].
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is: the impact of the choice of training dataset on the results when using Generative Adversarial Networks (GAN) for image super-resolution reconstruction. Specifically, the authors explore the following questions: 1. **Does the choice of training dataset affect the performance of SRGAN?** For example, if one wishes to reconstruct high-resolution facial images from low-resolution images, should a dataset containing facial images be used for training, or can any type of dataset be used for training? 2. **How does SRGAN generate super-resolution images by learning specific types of images?** The authors experimentally verified that after learning image objects (including shapes, colors, and textures), SRGAN can generate super-resolution images more accurately, rather than just sharpening edges. To answer these questions, the authors used three different datasets (CelebA, LSUN Dining Room, and LSUN Tower) for training and evaluated the impact of different datasets on the quality of generated images. The main contributions include: - Using Fréchet Inception Distance (FID) values to demonstrate that for the best results, the same type of image dataset as used during inference should be used for training. - Showcasing the potential of SRGAN in other applications, such as image coloring and edge-to-photo translation tasks in image-to-image translation.