Multi-domain Information Fusion for Key-Points Guided GAN Inversion.

Ruize Xu,Xiaowen Qiu,Boan He,Weifeng Ge,Wenqiang Zhang
DOI: https://doi.org/10.1007/978-981-99-8552-4_12
2024-01-01
Abstract:In recent years, GAN inversion has emerged as a powerful technique for bridging the gap between real and fake image domains, and it has become increasingly important for enabling pre-trained GAN models for real image editing applications. However, current GAN inversion methods are limited by network parameters and model structures, and there is still room for improvement in accurate reconstruction and latent editing tasks. In this paper, we propose a two-stage model that fine-tunes a pre-trained Masked Autoencoder in the first stage and utilizes multi-layers information fusion to obtain an initial global latent code. We then use this latent code as global queries for the subsequent cross-attention-based fusion of local key patch, key point feature, and residual image information in the second stage, guided by facial landmarks. This allows our model to better embed images in the $$W+$$ space and perform related attribute editing, achieving better results than current state-of-the-art methods. We conduct extensive experiments to demonstrate the capabilities of our model, as well as the roles of relevant modules, and study the effects of different domain information on inversion.
What problem does this paper attempt to address?