TCDiff: Triple Condition Diffusion Model with 3D Constraints for Stylizing Synthetic Faces

Bernardo Biesseck,Pedro Vidal,Luiz Coelho,Roger Granada,David Menotti|
2024-09-05
Abstract:A robust face recognition model must be trained using datasets that include a large number of subjects and numerous samples per subject under varying conditions (such as pose, expression, age, noise, and occlusion). Due to ethical and privacy concerns, large-scale real face datasets have been discontinued, such as MS1MV3, and synthetic face generators have been proposed, utilizing GANs and Diffusion Models, such as SYNFace, SFace, DigiFace-1M, IDiff-Face, DCFace, and GANDiffFace, aiming to supply this demand. Some of these methods can produce high-fidelity realistic faces, but with low intra-class variance, while others generate high-variance faces with low identity consistency. In this paper, we propose a Triple Condition Diffusion Model (TCDiff) to improve face style transfer from real to synthetic faces through 2D and 3D facial constraints, enhancing face identity consistency while keeping the necessary high intra-class variance. Face recognition experiments using 1k, 2k, and 5k classes of our new dataset for training outperform state-of-the-art synthetic datasets in real face benchmarks such as LFW, CFP-FP, AgeDB, and BUPT. Our source code is available at: <a class="link-external link-https" href="https://github.com/BOVIFOCR/tcdiff" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the quality of synthetic face datasets, especially their performance in face recognition tasks. Specifically, the paper proposes a three - condition diffusion model named TCDiff, which aims to improve the style transfer from real faces to synthetic faces through 2D and 3D face constraints, thereby enhancing the consistency of face identities while maintaining high intra - class variation. This method addresses two main problems existing in current synthetic face generation methods: 1. **Low intra - class variation**: Some existing methods can generate high - fidelity photorealistic face images, but these images have low intra - class variation, that is, the differences between different samples of the same identity are small, which limits their effectiveness in training face recognition models. 2. **Low identity consistency**: Other methods can generate highly variable face images, but perform poorly in maintaining identity consistency, that is, the identity features are inconsistent between different samples. TCDiff solves these problems by introducing 2D and 3D face constraints, aiming to generate synthetic face images with both high identity consistency and the necessary high intra - class variation. Experimental results show that the dataset generated using TCDiff can outperform existing synthetic datasets in multiple benchmark tests when training face recognition models.