ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

Pengzhi Li,Chengshuai Tang,Qinxuan Huang,Zhiheng Li
2024-05-17
Abstract:In this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques. Our method effectively bridges the gap between artistic and realistic images through an innovative image semantic transfer algorithm. By leveraging depth information and an initial artistic image, we generate a point cloud map, addressing domain differences. Additionally, we propose a depth consistency module to enhance 3D scene consistency. Finally, the 3D scene serves as initial points for optimizing Gaussian splats. Experimental results demonstrate ART3D's superior performance in both content and structural consistency metrics when compared to existing methods. ART3D significantly advances the field of AI in art creation by providing an innovative solution for generating high-quality 3D artistic scenes.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to generate high - quality 3D art scenes given text descriptions or reference images. Specifically, existing 3D art scene generation methods face the following main challenges: 1. **Data Domain Difference**: Existing 3D generation models are usually trained based on real - world scenes, and there are significant style differences between artworks and real - world scenes. This domain difference makes it difficult to directly apply these models to generate art - style 3D scenes. 2. **Lack of 3D Art Training Data**: Compared with 2D art generation, an important problem faced by 3D art generation is the lack of large - scale art - style 3D data sets. This limits the generalization ability of the model in the art field. 3. **Multi - view Consistency**: The generated 3D scenes need to maintain consistency and coherence from multiple perspectives. Existing methods are insufficient in dealing with multi - view consistency, resulting in inconsistent structures and details of the generated 3D scenes from different perspectives. To solve these problems, the paper proposes the ART3D framework, which combines the diffusion model and 3D Gaussian point - painting technology, and generates high - quality 3D art scenes through the following innovations: - **Image Semantic Transfer Algorithm**: By extracting the semantic features of the text or reference image provided by the user, a real image with the same semantic layout is generated, thereby bridging the domain gap between art images and real images. - **Depth Consistency Module**: A depth consistency module is introduced to ensure global and local consistency during the multi - view generation process and improve the overall quality of the 3D scene. - **3D Gaussian Point - painting Technology**: Utilize 3D Gaussian point - painting technology to generate high - quality 3D art scenes from the initial point cloud map, and render the final 3D scene by continuously representing the point cloud map. Through these innovations, ART3D has made significant progress in generating high - quality 3D art scenes, especially in terms of style consistency and structural coherence.