PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks
Xiaoxiong Du,Jun Peng,Yiyi Zhou,Jinlu Zhang,Siting Chen,Guannan Jiang,Xiaoshuai Sun,Rongrong Ji
DOI: https://doi.org/10.1145/3581783.3612067
2023-01-01
Abstract:Synthesizing vivid human portraits is a research hot spot in image generation with a wide scope of applications. In addition to fidelity, generation controllability is another key factor that has long plagued its development. To address this issue, existing solutions usually adopt either textual or visual conditions for the target face synthesis, e.g., descriptions or segmentation masks, which still cannot fully control the generation due to the intrinsic shortages of each condition. In this paper, we propose to make use of both types of prior information to facilitate controllable face generation. In particular, we hope to produce coarse-grained information about faces based on the segmentation masks, such as face shapes and poses, and the text description is used to render detailed face attributes, e.g., face color, makeup and gender. More importantly, we hope that the generation can be easily controlled via interactively editing both types of information, making face generation more applicable to real-world applications. To accomplish this target, we propose a novel face generation model termed PixelFace+. In PixelFace+, both the text and mask are encoded as pixel-wise priors, based on which the pixel synthesis process is conducted to produce the expected portraits. Meanwhile, the loss objectives are also carefully designed to make sure that the generated faces are semantically aligned with both text and mask inputs. To validate the proposed PixelFace+, we conducted a comprehensive set of experiments on the widely recognized benchmark called MMCelebA. We not only quantitatively compare PixelFace+ with a bunch of newly proposed Text-to-Face(T2F) generation methods, but also give plenty of qualitative analyses. The experimental results demonstrate that PixelFace+ not only outperforms existing generation methods in both image quality and conditional matching but also shows a much superior controllability of face generation. More importantly, PixelFace+ presents a convenient and interactive way of face generation and manipulation via editing the text and mask inputs. Our SOURCE CODE and DEMO are given in our supplementary materials.