Swin-UNIT: Transformer-based GAN for High-resolution Unpaired Image Translation

Yifan Li,Yaochen Li,Wenneng Tang,Zhifeng Zhu,Jinhuo Yang,Yuehu Liu
DOI: https://doi.org/10.1145/3581783.3612518
2023-01-01
Abstract:The transformer model has gained a lot of success in various computer vision tasks owing to its capacity of modeling long-range dependencies. However, its application has been limited in the area of high-resolution unpaired image translation using GANs due to the quadratic complexity with the spatial resolution of input features. In this paper, we propose a novel transformer-based GAN for high-resolution unpaired image translation named Swin-UNIT. A two-stage generator is designed which consists of a global style translation (GST) module and a recurrent detail supplement (RDS) module. The GST module focuses on translating low-resolution global features using the ability of self-attention. The RDS module offers quick information propagation from the global features to the detail features at a high resolution using cross-attention. Moreover, we customize a dual-branch discriminator to guide the generator. Extensive experiments demonstrate that our model achieves state-of-the-art results on the unpaired image translation tasks.
What problem does this paper attempt to address?