Abstract:We present GaussianAvatar, an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video. We start by introducing animatable 3D Gaussians to explicitly represent humans in various poses and clothing styles. Such an explicit and animatable representation can fuse 3D appearances more efficiently and consistently from 2D observations. Our representation is further augmented with dynamic properties to support pose-dependent appearance modeling, where a dynamic appearance network along with an optimizable feature tensor is designed to learn the motion-to-appearance mapping. Moreover, by leveraging the differentiable motion condition, our method enables a joint optimization of motions and appearances during avatar modeling, which helps to tackle the long-standing issue of inaccurate motion estimation in monocular settings. The efficacy of GaussianAvatar is validated on both the public dataset and our collected dataset, demonstrating its superior performances in terms of appearance quality and rendering efficiency.

What problem does this paper attempt to address?

This paper attempts to solve several key problems encountered when creating realistic human avatars from a single video: 1. **Dynamic Appearance Modeling**: Traditional methods based on implicit representations have problems of inefficiency and insufficient detail capture when modeling dynamic appearances. The paper proposes an explicit 3D Gaussian representation method, which can more efficiently and consistently fuse 3D appearances from 2D observations. 2. **Inaccurate Motion Estimation**: When estimating human motion from monocular videos, factors such as view limitations and image noise usually lead to inaccurate motion estimation. This will affect the modeling effect of dynamic clothing deformation. The paper improves the accuracy of the initial motion estimation by jointly optimizing motion and appearance, thereby improving the final modeling quality. 3. **Pose - Dependent Appearance Modeling**: In order to support dynamic appearance modeling in different poses, the paper introduces a dynamic appearance network and an optimizable feature tensor to learn the mapping relationship from motion to appearance. Specifically, the main contributions of the paper include: - **Introducing Animatable 3D Gaussian Representations**: By explicitly representing the human body surface, this method can more consistently and efficiently fuse 3D appearances from 2D observations. - **Enhancing Dynamic Attributes**: Based on the animatable 3D Gaussian, add dynamic attributes to support pose - dependent appearance modeling. - **Jointly Optimizing Motion and Appearance**: Optimize motion and appearance simultaneously during the modeling process, correct the alignment problems of the initial motion, and improve the final appearance quality. These innovations enable GaussianAvatar to create realistic avatars with dynamic appearances based on a single video, which are suitable for industries such as virtual reality, augmented reality, metaverse, games, and movies.

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling

Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians

Generalizable and Animatable Gaussian Head Avatar

Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction

Bundle Adjusted Gaussian Avatars Deblurring

MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar

3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

FAGhead: Fully Animate Gaussian Head from Monocular Videos

Expressive Gaussian Human Avatars from Monocular RGB Video

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

SAGA: Surface-Aligned Gaussian Avatar

Interactive Rendering of Relightable and Animatable Gaussian Avatars

Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities

GGAvatar: Geometric Adjustment of Gaussian Head Avatar

Deformable 3D Gaussian Splatting for Animatable Human Avatars