Abstract:Data simulation engines like Unity are becoming an increasingly important data source that allows us to acquire ground truth labels conveniently. Moreover, we can flexibly edit the content of an image in the engine, such as objects (position, orientation) and environments (illumination, occlusion). When using simulated data as training sets, its editable content can be leveraged to mimick the distribution of real-world data, and thus reduce the content difference between the synthetic and real domains. This paper explores content adaptation in the context of semantic segmentation, where the complex street scenes are fully synthesized using 19 classes of virtual objects from a first person driver perspective and controlled by 23 attributes. To optimize the attribute values and obtain a training set of similar content to real-world data, we propose a scalable discretization-and-relaxation (SDR) approach. Under a reinforcement learning framework, we formulate attribute optimization as a random-to-optimized mapping problem using a neural network. Our method has three characteristics. 1) Instead of editing attributes of individual objects, we focus on global attributes that have large influence on the scene structure, such as object density and illumination. 2) Attributes are quantized to discrete values, so as to reduce search space and training complexity. 3) Correlated attributes are jointly optimized in a group, so as to avoid meaningless scene structures and find better convergence points. Experiment shows our system can generate reasonable and useful scenes, from which we obtain promising real-world segmentation accuracy compared with existing synthetic training sets.

DIScene: Object Decoupling and Interaction Modeling for Complex Scene Generation

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis

Learning to Simulate Complex Scenes for Street Scene Segmentation

Learning 3 D Scene Synthesis from Annotated RGB-D Images

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

Diffusion-based Generation, Optimization, and Planning in 3D Scenes

DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Enhanced 3D Generation by 2D Editing

Knowledge-Guided Object Discovery with Acquired Deep Impressions

Comp4D: LLM-Guided Compositional 4D Scene Generation

Scene-Conditional 3D Object Stylization and Composition

Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting

ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

3D Scene Diffusion Guidance using Scene Graphs

DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting

Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches