Face Identity-Aware Disentanglement in StyleGAN

Adrian Suwała,Bartosz Wójcik,Magdalena Proszewska,Jacek Tabor,Przemysław Spurek,Marek Śmieja
2023-09-21
Abstract:Conditional GANs are frequently used for manipulating the attributes of face images, such as expression, hairstyle, pose, or age. Even though the state-of-the-art models successfully modify the requested attributes, they simultaneously modify other important characteristics of the image, such as a person's identity. In this paper, we focus on solving this problem by introducing PluGeN4Faces, a plugin to StyleGAN, which explicitly disentangles face attributes from a person's identity. Our key idea is to perform training on images retrieved from movie frames, where a given person appears in various poses and with different attributes. By applying a type of contrastive loss, we encourage the model to group images of the same person in similar regions of latent space. Our experiments demonstrate that the modifications of face attributes performed by PluGeN4Faces are significantly less invasive on the remaining characteristics of the image than in the existing state-of-the-art models.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper proposes a solution to the problem of inadvertently changing the identity of a person when using conditional generative adversarial networks (such as StyleGAN) for attribute editing of facial images. In their research, they developed a plugin model called PluGeN4Faces, which decouples the latent space of StyleGAN, so that modifications of facial attributes do not significantly affect the person's identity and other facial features. The key innovation lies in their use of images from movie frames for training, which show the same person in different poses and attributes. Through a contrastive loss function, the model is encouraged to cluster images of the same person in similar regions of the latent space. Specifically, PluGeN4Faces is a conditional invertible normalization flow module attached to the style space of StyleGAN. It transforms the pre-trained style codes of StyleGAN into a decoupled space, where each labeled attribute is modeled by separate latent dimensions, and images of the same person are clustered in similar regions of the latent space. To achieve this, they utilize conditional invertible normalization flow (cINF), which transforms the style codes generated by the StyleGAN encoder conditioned on the layer index of the style code. Experiments demonstrate that compared to existing models, PluGeN4Faces significantly reduces the impact of editing facial attributes on other image features (including person identity). Furthermore, the paper provides quantitative analysis to demonstrate the advantages of PluGeN4Faces over related models. Overall, the contributions of this paper include: 1. Proposing a StyleGAN plugin model for editing attributes of real images, trained on real images and using an encoder network to encode images into the style space of StyleGAN. 2. Improving the representation disentanglement of conditional generative models and explicitly encoding person identity through the application of contrastive loss, reducing the intrusiveness of requested attribute modifications on other image characteristics (including identity). 3. Rigorously evaluating the proposed solution and conducting fair comparisons with related models in terms of performance.