3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

Zhiyin Qian,Shaofei Wang,Marko Mihajlovic,Andreas Geiger,Siyu Tang
2024-04-04
Abstract:We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of creating animatable human avatars from monocular videos. Specifically, the paper proposes a method based on 3D Gaussian Splatting (3DGS) that can efficiently create animatable human avatars from monocular videos. Compared to existing methods, this approach has the following advantages: 1. **Fast Training**: On a single GPU, this method can complete training within 30 minutes, which is approximately 400 times faster than the state-of-the-art methods. 2. **Real-time Rendering**: The rendering speed exceeds 50 frames per second (FPS), which is about 250 times faster than the state-of-the-art methods. 3. **High-Quality Rendering**: Experimental results show that the rendering quality of this method is comparable to or even better than the best existing methods when creating animatable avatars from monocular input. 4. **Pose-Dependent Non-Rigid Deformation**: This method can handle highly complex poses and effectively generalize to unseen poses. In summary, by introducing the 3D Gaussian Splatting technique, this paper addresses the problem of creating high-quality, animatable human avatars from monocular video input and achieves significant improvements in training speed and rendering speed.