DIScene: Object Decoupling and Interaction Modeling for Complex Scene Generation

Xiao-Lei Li,Haodong Li,Hao-Xiang Chen,Tai-Jiang Mu,Shi-Min Hu
DOI: https://doi.org/10.1145/3680528.3687589
2024-01-01
Abstract:This paper reconsiders how to distill knowledge from pretrained 2D diffusion models to guide 3D asset generation, in particular to generate complex 3D scenes: it should accept varied inputs, i.e., texts or images, to allow for flexible expression of requirement; objects in the scene should be style-consistent and decoupled with clearly modeled interactions, benefiting downstream tasks. We propose DIScene, a novel method for this task. It represents the entire 3D scene with a learnable structured scene graph: each node explicitly models an object with its appearance, textual description, transformation, geometry as a mesh attached with surface-aligned Gaussians; the graph’s edges model object interactions. With this new representation, objects are optimized in the canonical space and interactions between objects are optimized by object-aware rendering to avoid wrong back-propagation. Extensive experiments demonstrate the significant utility and superiority of our approach and that DIScene can greatly facilitate 3D content creation tasks.
What problem does this paper attempt to address?