Abstract:4D content generation has achieved remarkable progress recently. However, existing methods suffer from long optimization times, a lack of motion controllability, and a low quality of details. In this paper, we introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS). Our key insight is that combining explicit modeling of spatial transformations with static GS makes an efficient and powerful representation for 4D generation. Moreover, video generation methods have the potential to offer valuable spatial-temporal priors, enhancing the high-quality 4D generation. Specifically, we propose an integral framework with two major modules: 1) Image-to-4D GS - we initially generate static GS with DreamGaussianHD, followed by HexPlane-based dynamic generation with Gaussian deformation; and 2) Video-to-Video Texture Refinement - we refine the generated UV-space texture maps and meanwhile enhance their temporal consistency by utilizing a pre-trained image-to-video diffusion model. Notably, DG4D reduces the optimization time from several hours to just a few minutes, allows the generated 3D motion to be visually controlled, and produces animated meshes that can be realistically rendered in 3D engines.
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the long - time optimization, insufficient motion control ability and low - quality details when generating 4D content. Although the existing 4D content generation methods have made remarkable progress, these problems are普遍存在 (commonly exist), which limit their potential in practical applications. For example, the existing methods need several hours to generate a 4D scene, and the generated motion is often difficult to be effectively controlled. In addition, the implicit 4D representation of these methods is not efficient, resulting in slow rendering speed and less regularized motion. The stochastic nature of Score Distillation Sampling (SDS) also increases the convergence difficulty and introduces instability and artifacts in the final results.
To overcome these challenges, the paper proposes DreamGaussian4D (DG4D), an efficient 4D generation framework, which accelerates the 4D generation process by combining explicit 4D representation and video - driven optimization. Specifically, DG4D uses 3D Gaussian Splatting (GS) to explicitly represent static 3D scenes and HexPlane to describe dynamic displacement maps. This method not only greatly reduces the optimization time from several hours to several minutes, but also allows visual control of the generated 3D motion and can generate animated meshes that can be realistically rendered in 3D engines.
### Main contributions
1. **A principle - based image - to - 4D generation framework**: A principle - based framework that combines image - conditional 3D generation and video generation models is proposed, which can directly control and select the desired 3D content and its motion, achieving high - quality and diverse 4D generation.
2. **Explicit 4D representation**: By using Gaussian Splatting and HexPlane, the 4D scene and its deformation at different timestamps are explicitly represented, significantly reducing the 4D generation time.
3. **Video - to - video texture refinement**: A video - to - video texture refinement strategy is introduced to further improve the quality of the exported animated meshes, maintain temporal consistency, and make the framework more suitable for practical deployment.
4. **Superior performance**: The experimental results show that DG4D can generate diverse 4D content with higher quality and shorter optimization time than the existing methods.
### Method overview
DG4D mainly consists of two stages:
1. **Image - to - 4D GS generation**: Generate static 3D GS from the input image, and then introduce real motion by optimizing the time - dependent deformation field to generate an animated mesh sequence.
2. **Video - to - video texture refinement**: Through the video - to - video refinement pipeline, use the pre - trained image - to - video diffusion model to improve the texture quality and enhance temporal consistency.
### Experimental results
The paper verifies the effectiveness of DG4D through quantitative and qualitative experiments. Compared with the existing state - of - the - art methods, DG4D performs well in terms of image alignment, 3D appearance and motion quality, etc., and the optimization time is significantly shortened. For example, in the image - to - 4D generation task, DG4D reaches a high score of 0.9227 on the CLIP - I index, far exceeding other methods. In the video - to - 4D generation task, the two variants of DG4D (Ours - Fast and Ours) are significantly better than the baseline methods in terms of LPIPS, CLIP and FVD indexes, while the optimization time is greatly reduced.
In conclusion, DG4D significantly improves the efficiency and quality of 4D content generation through innovative 4D representation and optimization strategies, and has broad application prospects.