Self-supervised Deformation Modeling for Facial Expression Editing

ShahRukh Athar,Zhixin Shu,Dimitris Samaras
DOI: https://doi.org/10.1109/fg47880.2020.00115
2020-11-01
Abstract:Deep generative models have recently demonstrated impressive results in photo-realistic facial image synthesis and editing. Existing neural network-based approaches usually only rely on texture generation to edit expressions and largely neglect the motion information. However, facial expressions are inherently the result of muscle movement. In this work, we propose a novel end-to-end network that disentangles the task of facial editing into two steps: a “motionediting” step and a “texture-editing” step. In the “motionediting” step, we explicitly model facial movement through an image deformation, warping the image into the desired expression. In the “texture-editing” step, we generate the necessary textures, such as teeth and shading effects, for a photorealistic result. Our physically-based task-disentanglement system design allows each step to learn a focused task, and thus need not generate texture to hallucinate motion. Our system is trained in a self-supervised manner, requiring no ground truth deformation annotation. Using Action Units [8] as the representation for facial expression, our method improves the state-of-the-art facial expression editing performance in both qualitative and quantitative evaluations.1.
What problem does this paper attempt to address?