Abstract:Controllable 3D indoor scene synthesis stands at the forefront of technological progress, offering various applications like gaming, film, and augmented/virtual reality. The capability to stylize and de-couple objects within these scenarios is a crucial factor, providing an advanced level of control throughout the editing process. This control extends not just to manipulating geometric attributes like translation and scaling but also includes managing appearances, such as stylization. Current methods for scene stylization are limited to applying styles to the entire scene, without the ability to separate and customize individual objects. Addressing the intricacies of this challenge, we introduce a unique pipeline designed for synthesis 3D indoor scenes. Our approach involves strategically placing objects within the scene, utilizing information from professionally designed bounding boxes. Significantly, our pipeline prioritizes maintaining style consistency across multiple objects within the scene, ensuring a cohesive and visually appealing result aligned with the desired aesthetic. The core strength of our pipeline lies in its ability to generate 3D scenes that are not only visually impressive but also exhibit features like photorealism, multi-view consistency, and diversity. These scenes are crafted in response to various natural language prompts, demonstrating the versatility and adaptability of our model.

What problem does this paper attempt to address?

This paper proposes a solution to the problem of style control and object decoupling in 3D indoor scene synthesis. Current methods can only apply styles to the entire scene, and cannot customize and stylize individual objects separately. The paper introduces a new pipeline method that strategically places objects in the scene and utilizes professionally designed bounding box information. The key innovation is to maintain style consistency among multiple objects in the scene while generating 3D scenes with photo-realism, multi-view consistency, and diversity. This approach can generate scenes based on natural language prompts, demonstrating the flexibility and adaptability of the model. By using separate mesh objects, independent stylization of individual objects can be achieved, increasing editing freedom. Compared to existing techniques, experiments show significant improvements in 3D stylization with excellent visual consistency, spatial layout, and aesthetic effects. Future research may include addressing limitations of the method, such as restricted placement of objects in regular-shaped rooms.

Style-Consistent 3D Indoor Scene Synthesis with Decoupled Objects