Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

Zhe Li,Zerong Zheng,Lizhen Wang,Yebin Liu
2024-03-31
Abstract:Modeling animatable human avatars from RGB videos is a long-standing and challenging problem. Recent works usually adopt MLP-based neural radiance fields (NeRF) to represent 3D humans, but it remains difficult for pure MLPs to regress pose-dependent garment details. To this end, we introduce Animatable Gaussians, a new avatar representation that leverages powerful 2D CNNs and 3D Gaussian splatting to create high-fidelity avatars. To associate 3D Gaussians with the animatable avatar, we learn a parametric template from the input videos, and then parameterize the template on two front \& back canonical Gaussian maps where each pixel represents a 3D Gaussian. The learned template is adaptive to the wearing garments for modeling looser clothes like dresses. Such template-guided 2D parameterization enables us to employ a powerful StyleGAN-based CNN to learn the pose-dependent Gaussian maps for modeling detailed dynamic appearances. Furthermore, we introduce a pose projection strategy for better generalization given novel poses. Overall, our method can create lifelike avatars with dynamic, realistic and generalized appearances. Experiments show that our method outperforms other state-of-the-art approaches. Code:
Graphics
What problem does this paper attempt to address?
The paper aims to address the problem of efficiently and realistically creating animatable human avatars from RGB videos. Specifically, the paper proposes a new method called "Animatable Gaussians," which combines powerful 2D Convolutional Neural Networks (2D CNNs) with 3D Gaussian splatting techniques to achieve highly realistic, dynamic, and generalizable human avatar modeling. The main contributions include: 1. **Proposing a new avatar representation method**: Animatable Gaussians utilize 3D Gaussian splatting and 2D CNNs to create high-fidelity virtual avatars. This method can overcome the difficulties of pure Multi-Layer Perceptrons (MLPs) in regressing pose-dependent clothing details. 2. **Template-guided parameterization**: By learning character-specific templates from input videos and projecting these templates onto front and back standard Gaussian maps, each pixel represents a 3D Gaussian distribution. This method is suitable for various types of clothing, including loose garments like skirts. 3. **Pose projection strategy**: To improve generalization to new poses, the paper introduces a pose projection strategy based on Principal Component Analysis (PCA), which helps in reasonable interpolation within the distribution of training poses, thereby achieving high-quality new pose synthesis. Overall, the Animatable Gaussians method can faithfully reconstruct human details under training poses and generate reasonable and high-quality animation effects for novel poses. Experimental results show that this method outperforms current state-of-the-art methods in terms of reconstruction accuracy and animation quality.