LaDiffGAN: Training GANs with Diffusion Supervision in Latent Spaces

Xuhui Liu,Bohan Zeng,Sicheng Gao,Shanglin Li,Yutang Feng,Hong Li,Boyu Liu,Jianzhuang Liu,Baochang Zhang
DOI: https://doi.org/10.1109/cvprw63382.2024.00118
2024-01-01
Computer Vision and Pattern Recognition
Abstract:Diffusion models have recently become increasingly popular in a number of computer vision tasks, but they fail to achieve satisfactory results for unsupervised image-to-image translation, since they require massive training data and rely heavily on extra guidance. In this scenario, GANs can alleviate these issues existing in diffusion models, albeit with suboptimal quality. In this paper, we leverage the advantages of both GANs and diffusion models by training GANs with diffusion supervision in latent spaces (LaDiffGAN) to solve the unsupervised image-to-image translation task. Firstly, to promote style transfer quality, we encode the data in specific latent spaces with styles of the target and source domains. Secondly, we introduce the diffusion process with different amounts of Gaussian noise to enhance the modeling capability of GANs on the complex data distribution. We accordingly design a latent diffusion GAN loss to align the latent features between generated and training images. Lastly, we introduce a heterogeneous conditional denoising loss that incorporates image-level supervision to further improve the quality of generated results. Our LaDiffGAN significantly alleviates the drawbacks associated with diffusion models, such as data leakage, high inference cost, and high dependence on large training data sets. Extensive experiments show that LaDiffGAN outperforms previous GAN models and delivers comparable or even better performance than diffusion models.
What problem does this paper attempt to address?