Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Chenghao Li,Chaoning Zhang,Joseph Cho,Atish Waghwase,Lik-Hang Lee,Francois Rameau,Yang Yang,Sung-Ho Bae,Choong Seon Hong
2024-10-25
Abstract:Generative AI has made significant progress in recent years, with text-guided content generation being the most practical as it facilitates interaction between human instructions and AI-generated content (AIGC). Thanks to advancements in text-to-image and 3D modeling technologies, like neural radiance field (NeRF), text-to-3D has emerged as a nascent yet highly active research field. Our work conducts a comprehensive survey on this topic and follows up on subsequent research progress in the overall field, aiming to help readers interested in this direction quickly catch up with its rapid development. First, we introduce 3D data representations, including both Structured and non-Structured data. Building on this pre-requisite, we introduce various core technologies to achieve satisfactory text-to-3D results. Additionally, we present mainstream baselines and research directions in recent text-to-3D technology, including fidelity, efficiency, consistency, controllability, diversity, and applicability. Furthermore, we summarize the usage of text-to-3D technology in various applications, including avatar generation, texture generation, scene generation and 3D editing. Finally, we discuss the agenda for the future development of text-to-3D.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the technical challenges of generating 3D models from text (text-to-3D). Specifically, the paper focuses on the following aspects: 1. **Data Representation**: There are various methods for representing 3D data, including structured data (such as voxel grids, multi-view images) and unstructured data (such as 3D meshes, point clouds, neural fields). These different representation methods have their own advantages and disadvantages in terms of representation capability, computational efficiency, and memory efficiency. 2. **Core Technologies**: The paper introduces core technologies for generating 3D models, such as Neural Radiance Fields (NeRF) and diffusion models. These technologies play a key role in generating high-quality 3D models. 3. **Limitations of Existing Methods**: Early text-to-3D methods relied on pre-trained image-text models, which alleviated the scarcity of 3D training data to some extent, but the generated 2D rendering results were often unrealistic. To address these issues, subsequent research introduced stronger image priors and optimization strategies, such as DreamFusion and SJC methods. 4. **Application Areas**: Text-to-3D technology has a wide range of applications in various fields, including virtual character generation, texture generation, scene generation, and 3D editing. The paper summarizes these applications and discusses future development directions. 5. **Future Research Directions**: Despite significant progress in text-to-3D technology, there are still some challenges, such as low fidelity, long inference time, consistency issues, poor controllability, and lack of diversity. The paper proposes new methods to improve these areas, such as enhancing fidelity, optimizing inference speed, and improving consistency and controllability. Overall, through a comprehensive review and analysis, this paper aims to help readers quickly understand the latest developments in the text-to-3D field and provide guidance for future research.