CT4D: Consistent Text-to-4D Generation with Animatable Meshes

Ce Chen,Shaoli Huang,Xuelin Chen,Guangyi Chen,Xiaoguang Han,Kun Zhang,Mingming Gong
2024-08-15
Abstract:Text-to-4D generation has recently been demonstrated viable by integrating a 2D image diffusion model with a video diffusion model. However, existing models tend to produce results with inconsistent motions and geometric structures over time. To this end, we present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user-supplied prompts. The primary challenges of our mesh-based framework involve stably generating a mesh with details that align with the text prompt while directly driving it and maintaining surface continuity. Our CT4D framework incorporates a unique Generate-Refine-Animate (GRA) algorithm to enhance the creation of text-aligned meshes. To improve surface continuity, we divide a mesh into several smaller regions and implement a uniform driving function within each area. Additionally, we constrain the animating stage with a rigidity regulation to ensure cross-region continuity. Our experimental results, both qualitative and quantitative, demonstrate that our CT4D framework surpasses existing text-to-4D techniques in maintaining interframe consistency and preserving global geometry. Furthermore, we showcase that this enhanced representation inherently possesses the capability for combinational 4D generation and texture editing.
Graphics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the temporal inconsistency and geometric structure distortion in the generation results of existing text - to - 4D generation methods. Specifically: 1. **Temporal Inconsistency**: When existing text - to - 4D generation methods generate dynamic scenes, motion inconsistency often occurs between frames, resulting in poor visual effects, especially obvious jitter in the object edge areas. 2. **Geometric Structure Distortion**: These methods have difficulty maintaining the stability of geometric structures when adding motion, leading to geometric structure distortion in the generated objects during the dynamic phase, especially in the object edges and details. To overcome these problems, the paper proposes a new framework based on animatable triangle meshes - CT4D (Consistent Text - to - 4D Generation with Animatable Meshes). This framework solves the above problems through the following innovations: - **Explicit Representation**: Use animatable triangle meshes as 4D representation to explicitly separate geometric structures and textures, thereby more effectively maintaining the stability of geometric structures and temporal consistency. - **Generate - Refine - Animate (GRA) Algorithm**: Through a three - stage algorithm (Generate - Refine - Animate), gradually generate high - quality geometric structures and textures, and ensure the smoothness and continuity of the animation. - **Surface Continuity**: Through vertex clustering and rigidity regulation techniques, ensure the surface continuity between various regions during the animation process and avoid geometric structure distortion. Through these innovations, the CT4D framework can significantly improve the inter - frame consistency and the stability of geometric structures when generating high - quality 4D content. Experimental results show that the CT4D framework is significantly superior to existing text - to - 4D generation methods in these aspects.