AgileAvatar: Stylized 3D Avatar Creation via Cascaded Domain Bridging

Shen Sang,Tiancheng Zhi,Guoxian Song,Minghao Liu,Chunpong Lai,Jing Liu,Xiang Wen,James Davis,Linjie Luo
DOI: https://doi.org/10.1145/3550469.3555402
2022-11-15
Abstract:Stylized 3D avatars have become increasingly prominent in our modern life. Creating these avatars manually usually involves laborious selection and adjustment of continuous and discrete parameters and is time-consuming for average users. Self-supervised approaches to automatically create 3D avatars from user selfies promise high quality with little annotation cost but fall short in application to stylized avatars due to a large style domain gap. We propose a novel self-supervised learning framework to create high-quality stylized 3D avatars with a mix of continuous and discrete parameters. Our cascaded domain bridging framework first leverages a modified portrait stylization approach to translate input selfies into stylized avatar renderings as the targets for desired 3D avatars. Next, we find the best parameters of the avatars to match the stylized avatar renderings through a differentiable imitator we train to mimic the avatar graphics engine. To ensure we can effectively optimize the discrete parameters, we adopt a cascaded relaxation-and-search pipeline. We use a human preference study to evaluate how well our method preserves user identity compared to previous work as well as manual creation. Our results achieve much higher preference scores than previous work and close to those of manual creation. We also provide an ablation study to justify the design choices in our pipeline.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to automatically create high - quality personalized 3D avatars, especially stylized 3D avatars, from users' self - portrait photos. Manually creating such avatars usually requires tedious selection and adjustment from a large number of artistic resources, which is both time - consuming and difficult for ordinary users. Although existing self - supervised methods can automatically generate semi - realistic 3D avatars from users' self - portraits and perform well in maintaining user identity, these methods are not effective when applied to stylized avatars because of the large style - domain gap between self - portraits and stylized avatars. To overcome these challenges, the author proposes a new self - supervised learning framework that can handle the mixture of continuous and discrete parameters to create high - quality stylized 3D avatars. Specifically, this framework gradually narrows the style - domain gap through three stages: 1) Portrait stylization, converting the input self - portrait into a stylized avatar rendering; 2) Self - supervised avatar parameterization, finding the optimal avatar parameters by training a differentiable simulator that imitates the behavior of the graphics engine; 3) Avatar vector conversion, converting the parameters in the relaxed avatar vector space into the parameters in the strict avatar vector space so that the graphics engine can use them directly. In addition, the paper also evaluates the performance of this method in retaining personal identity through human preference studies. The results show that this method scores higher than existing methods and is close to the effect of manual creation. The author also provides ablation studies to prove the effectiveness of pipeline design choices.