Face Attribute Invertion

X G Tu,Y Luo,H S Zhang,W J Ai,Z Ma,M Xie
DOI: https://doi.org/10.48550/arXiv.2001.04665
2020-01-14
Abstract:Manipulating human facial images between two domains is an important and interesting problem. Most of the existing methods address this issue by applying two generators or one generator with extra conditional inputs. In this paper, we proposed a novel self-perception method based on GANs for automatical face attribute inverse. The proposed method takes face images as inputs and employs only one single generator without being conditioned on other inputs. Profiting from the multi-loss strategy and modified U-net structure, our model is quite stable in training and capable of preserving finer details of the original face images.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform image conversion between two facial attribute domains, that is, to edit specific attributes (such as expressions, ages, genders, etc.) in face images while keeping other regions unchanged. Existing methods usually solve this problem by using two generators or one generator with additional conditional inputs. However, these methods are insufficient in reconstructing the details of the original image and are not efficient enough when dealing with attribute inversion tasks. To solve these problems, the paper proposes a new method based on Generative Adversarial Networks (GANs). This method uses a single generator to automatically complete facial attribute inversion without relying on other conditional inputs. In addition, in order to improve the stability of training and retain more details of the original facial image, this method adopts a multi - loss strategy and an improved U - Net structure. Specifically, this method achieves its goals in the following ways: 1. **Single - generator design**: Different from traditional methods, this method uses only one generator to complete the conversion from one attribute to another, reducing the model complexity. 2. **Improved U - Net structure**: By adding 1×1 filters between the corresponding layers of the encoder and the decoder, redundant information is reduced, making it easier to modify the attribute - related regions while retaining more high - level information (such as identity features). 3. **Multi - task discriminator**: The discriminator is not only responsible for distinguishing between real images and generated images but also for classifying facial attributes, thereby guiding the generator to generate images with opposite attributes. 4. **Forward - backward consistency loss**: By introducing reconstruction loss and feature - matching loss, it is ensured that the generated image retains the details of other irrelevant regions while modifying specific attributes. Through these technical means, the method proposed in the paper performs well in experiments. Especially when dealing with the inversion tasks of local attributes (such as glasses, mouth opening and closing) and global attributes (such as age, gender), compared with existing methods such as CycleGAN, it can generate higher - quality images and is superior to other methods in both visual quality and quantitative evaluation indicators.