Disentangled 3D Scene Generation with Layout Learning

Dave Epstein,Ben Poole,Ben Mildenhall,Alexei A. Efros,Aleksander Holynski
DOI: https://doi.org/10.48550/arXiv.2402.16936
2024-02-27
Abstract:We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs from scratch - each representing its own object - along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation. For results and an interactive demo, see our project page at <a class="link-external link-https" href="https://dave.ml/layoutlearning/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?