Swin-GAN: Generative Adversarial Network Based on Shifted Windows Transformer Architecture for Image Generation

Shibin Wang,Zidiao Gao,Dong Liu
DOI: https://doi.org/10.1007/s00371-022-02714-9
IF: 2.835
2022-01-01
The Visual Computer
Abstract:It is well known that every successful generative adversarial network (GAN) relies on the convolutional neural networks (CNN)-based generators and discriminators. However, CNN cannot process the long-range dependencies because its convolution operator has a local receptive field, which can bring some issues to GAN, such as the optimization, the loss of feature resolution and the fine details. To meet the problem of long-term dependence, we propose a GAN model based on shifted windows Transformer architecture, called Swin-GAN, in which the CNN architecture is replaced by Transformer. In our model, we build a memory-friendly generator based on the shifted window attention mechanism to gradually increase the resolution of feature maps at each stage. Another, we build a multi-scale discriminator to split the image into patches of different sizes as the input at different stages, which can achieve the balance between capturing global contextual semantic information and local detailed features. To further improve the fidelity and stability, we use the techniques such as data enhancement, layer normalization and relative position coding in our model. Compared with the current schemes, the experimental results show that our scheme has better performance, fewer parameters and lower computational cost. Specifically, Params value of Swin-GAN model is 30.254M, and Floating-Point Operations Per Second (FLOPs) value is 4.086G. Inception Score (IS) is 9.04 and Fréchet Inception Distance (FID) is 9.23 in CIFAR-10.
What problem does this paper attempt to address?