CT-GAN: A conditional Generative Adversarial Network of transformer architecture for text-to-image

Xin Zhang,Wentao Jiao,Bing Wang,Xuedong Tian
DOI: https://doi.org/10.1016/j.image.2023.116959
2023-03-18
Abstract:How to generate an image from a text description is an imaginative and challenging task. This study proposes a conditional generative adversarial network (GAN) of transformer architecture for text-to-image tasks called CT-GAN by employing the GAN generator based on transformer architecture.We also propose a filtering module suitable for non-end-to-end multi-stage models. This module can screen out the good images generated in the previous stage and allows only the good images to participate in the generation of the later stage. This method significantly improves the quality of the generated images. Furthermore, we designed a generator and discriminator based on symmetry. In the generator, we propose a shift self-attention technology to establish information communication between grids, reduce boundary loss, and improve image quality. We established two modes of local and global discriminations based on the grid, which can balance the performance of the generator and discriminator, improve the training stability, and accelerate the model convergence. We conducted several experiments on the widely used conditional datasets (CUB and COCO) and unconditional datasets (CelebA and LSUN church). The experimental results show that the proposed CT-GAN is superior to the most advanced convolution model in generating diversity and semantic consistency. Codes are available at: https://github.com/Jwtcode/CT-GAN .
engineering, electrical & electronic
What problem does this paper attempt to address?