Expressive Whole-Body 3D Gaussian Avatar

Gyeongsik Moon,Takaaki Shiratori,Shunsuke Saito
2024-07-31
Abstract:Facial expression and hand motions are necessary to express our emotions and interact with the world. Nevertheless, most of the 3D human avatars modeled from a casually captured video only support body motions without facial expressions and hand <a class="link-external link-http" href="http://motions.In" rel="external noopener nofollow">this http URL</a> this work, we present ExAvatar, an expressive whole-body 3D human avatar learned from a short monocular video. We design ExAvatar as a combination of the whole-body parametric mesh model (SMPL-X) and 3D Gaussian Splatting (3DGS). The main challenges are 1) a limited diversity of facial expressions and poses in the video and 2) the absence of 3D observations, such as 3D scans and RGBD images. The limited diversity in the video makes animations with novel facial expressions and poses non-trivial. In addition, the absence of 3D observations could cause significant ambiguity in human parts that are not observed in the video, which can result in noticeable artifacts under novel motions. To address them, we introduce our hybrid representation of the mesh and 3D Gaussians. Our hybrid representation treats each 3D Gaussian as a vertex on the surface with pre-defined connectivity information (i.e., triangle faces) between them following the mesh topology of SMPL-X. It makes our ExAvatar animatable with novel facial expressions by driven by the facial expression space of SMPL-X. In addition, by using connectivity-based regularizers, we significantly reduce artifacts in novel facial expressions and poses.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the following main issues: 1. **Limited diversity of expressions and poses**: Most 3D human avatars constructed from videos captured in daily life only support body movements and cannot express facial expressions and hand movements because the diversity of expressions and poses in these videos is limited. 2. **Lack of 3D observational data**: Without 3D observational data such as 3D scans or RGBD images, it is difficult to accurately reconstruct the 3D geometry of the human body, especially for parts of the body not observed in the video, which can lead to significant flaws in new poses. To address the above challenges, the authors propose ExAvatar—a 3D human avatar that combines a holistic body parameterized mesh model (SMPL-X) and 3D Gaussian splatting (3DGS) technology. ExAvatar can be created from a short monocular video and can be animated using SMPL-X's facial expression codes and 3D poses, even if the video contains limited expressions and poses. By adopting a hybrid representation, each 3D Gaussian point is treated as a vertex of the surface, and their connectivity information is predefined according to the SMPL-X mesh topology, allowing ExAvatar to leverage SMPL-X's facial expression space, thereby achieving good adaptability to novel facial expressions. Additionally, by utilizing a connectivity-based regularizer, significant flaws that may appear under novel facial expressions and poses can be greatly reduced. Specifically, the main contributions of the paper include: - Proposing ExAvatar, an expressive holistic 3D human avatar that can be created from a short monocular video without additional 3D observational data. - Designing a hybrid representation method that combines surface meshes and 3D Gaussian points, allowing ExAvatar to be animated using any facial expression codes from SMPL-X, even if the video contains limited expressions. - Utilizing connectivity information between 3D Gaussian points to significantly reduce potential flaws under novel facial expressions and poses. Through experimental validation, ExAvatar significantly outperforms previous methods on various benchmarks.