3D Scene Diffusion Guidance using Scene Graphs

Mohammad Naanaa,Katharina Schmid,Yinyu Nie
2023-08-08
Abstract:Guided synthesis of high-quality 3D scenes is a challenging task. Diffusion models have shown promise in generating diverse data, including 3D scenes. However, current methods rely directly on text embeddings for controlling the generation, limiting the incorporation of complex spatial relationships between objects. We propose a novel approach for 3D scene diffusion guidance using scene graphs. To leverage the relative spatial information the scene graphs provide, we make use of relational graph convolutional blocks within our denoising network. We show that our approach significantly improves the alignment between scene description and generated scene.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to more accurately align the relationship between the generated scene and the input description when generating high - quality 3D scenes. Existing methods mainly rely on text embedding to control the generation process, and this method performs poorly when dealing with complex spatial relationships. Therefore, the author proposes a new method based on Scene Graphs. By using the relative spatial information provided in the scene graph and using Relational Graph Convolutional Blocks to improve the denoising process in the generation network, the alignment between the generated scene and the given conditions is significantly improved. Specifically, the main contributions of the paper include: - Proposing a new 3D scene diffusion guidance method that uses the scene graph as a condition. - Introducing a novel technique for conditioning matrix - shaped data on the scene graph, using the relational graph convolutional network. - Through experimental verification, this method can significantly improve the alignment between the generated scene and the given conditions. Through a series of experiments and evaluations, including quantitative and qualitative analysis, the paper proves that the proposed method has superior performance in generating 3D scenes that conform to complex input descriptions.