One-shot Ultra-high-Resolution Generative Adversarial Network That Synthesizes 16K Images On A Single GPU

Junseok Oh,Donghwee Yoon,Injung Kim
2023-08-28
Abstract:We propose a one-shot ultra-high-resolution generative adversarial network (OUR-GAN) framework that generates non-repetitive 16K (16, 384 x 8, 640) images from a single training image and is trainable on a single consumer GPU. OUR-GAN generates an initial image that is visually plausible and varied in shape at low resolution, and then gradually increases the resolution by adding detail through super-resolution. Since OUR-GAN learns from a real ultra-high-resolution (UHR) image, it can synthesize large shapes with fine details and long-range coherence, which is difficult to achieve with conventional generative models that rely on the patch distribution learned from relatively small images. OUR-GAN can synthesize high-quality 16K images with 12.5 GB of GPU memory and 4K images with only 4.29 GB as it synthesizes a UHR image part by part through seamless subregion-wise super-resolution. Additionally, OUR-GAN improves visual coherence while maintaining diversity by applying vertical positional convolution. In experiments on the ST4K and RAISE datasets, OUR-GAN exhibited improved fidelity, visual coherency, and diversity compared with the baseline one-shot synthesis models. To the best of our knowledge, OUR-GAN is the first one-shot image synthesizer that generates non-repetitive UHR images on a single consumer GPU. The synthesized image samples are presented at <a class="link-external link-https" href="https://our-gan.github.io" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
This paper attempts to address the problem of generating non-repetitive 16K (16,384×8,640) ultra-high-definition images from a single training image on a single consumer-grade GPU. Specifically, existing generative models typically require a large amount of training data and have limitations in terms of output resolution, visual coherence, and diversity. This paper proposes a new generative adversarial network framework (OUR-GAN) to overcome these limitations and achieve the following goals: 1. **Generate high-resolution images**: Generate images with 16K resolution, which is difficult for existing models to achieve. 2. **Single training**: Train using only one training image, suitable for scenarios where data acquisition is costly or it is difficult to obtain a large amount of data. 3. **Low resource requirements**: Capable of running on a single consumer-grade GPU, with low computational resource requirements. 4. **Maintain visual coherence and diversity**: The generated images not only have high resolution but also maintain visual coherence and diversity. To achieve these goals, OUR-GAN employs the following techniques: - **Seamless sub-region super-resolution**: By dividing the image into multiple overlapping sub-regions and gradually increasing the resolution, it avoids discontinuities at the boundaries of the sub-regions. - **Vertical coordinate convolution**: Utilizes the vertical positional information of visual elements in the image to improve the visual coherence of the generated images. - **Pre-training and fine-tuning strategy**: By pre-training the super-resolution model on public datasets and then fine-tuning it on a single training image, the quality of the generated images is improved. In summary, the main contribution of this paper is to propose a framework that can generate high-quality, high-resolution images on a single consumer-grade GPU, addressing the shortcomings of existing generative models in terms of resource requirements and generation effects.