Abstract:Scene simulation in autonomous driving has gained significant attention because of its huge potential for generating customized data. However, existing editable scene simulation approaches face limitations in terms of user interaction efficiency, multi-camera photo-realistic rendering and external digital assets integration. To address these challenges, this paper introduces ChatSim, the first system that enables editable photo-realistic 3D driving scene simulations via natural language commands with external digital assets. To enable editing with high command flexibility,~ChatSim leverages a large language model (LLM) agent collaboration framework. To generate photo-realistic outcomes, ChatSim employs a novel multi-camera neural radiance field method. Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering. Our experiments on Waymo Open Dataset demonstrate that ChatSim can handle complex language commands and generate corresponding photo-realistic scene videos.

What problem does this paper attempt to address?

This paper attempts to solve three key problems in autonomous driving scenario simulation: 1. **Complex user interaction requirements**: Existing editable scenario simulation methods are less efficient in handling complex or abstract user commands and cannot generate customized driving scenarios efficiently. 2. **Multi - view photo - realistic rendering**: Existing methods have deficiencies in photo - realistic rendering in multi - camera settings, especially when dealing with different exposure times and inconsistent camera poses. 3. **Integration of external digital assets**: Existing simulation methods have difficulty seamlessly integrating high - quality external digital assets (such as 3D models, textures, etc.) into the scene to meet customization requirements. To solve these problems, the paper introduces the **ChatSim** system. ChatSim enables editable photo - realistic 3D driving scenario simulation through natural language commands and can integrate external digital assets. Specifically: - **Natural - language - command - driven editing**: ChatSim utilizes a large - language - model (LLM) agent collaboration framework, allowing users to easily edit the scene through natural - language commands without writing code or performing complex intermediate steps. - **Multi - camera Neural Radiance Field (McNeRF)**: To achieve photo - realistic rendering, ChatSim proposes the McNeRF method, which takes into account the different exposure times of multi - camera inputs and solves the problems of inconsistent brightness and camera - pose mismatching. - **Multi - camera illumination estimation (McLight)**: To make external digital assets consistent with the scene illumination conditions, ChatSim proposes a new multi - camera illumination estimation method McLight, which combines sky - dome illumination and surrounding illumination estimation to ensure seamless integration of external assets and realistic shadow effects. Through these innovations, ChatSim can generate photo - realistic scene videos on the Waymo Open Dataset that conform to various human - language commands, thus providing a powerful tool for the testing of autonomous driving perception systems.

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

Learning to Simulate Complex Scenes for Street Scene Segmentation

ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles

GarchingSim: An Autonomous Driving Simulator with Photorealistic Scenes and Minimalist Workflow

ChatDyn: Language-Driven Multi-Actor Dynamics Generation in Street Scenes

SimGen: Simulator-conditioned Driving Scene Generation

SceneDM: Scene-level Multi-agent Trajectory Generation with Consistent Diffusion Models

SurrealDriver: Designing Generative Driver Agent Simulation Framework in Urban Contexts based on Large Language Model

GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving

Natural-language-driven Simulation Benchmark and Copilot for Efficient Production of Object Interactions in Virtual Road Scenes

Car-Studio: Learning Car Radiance Fields from Single-View and Endless In-the-wild Images

Car-Studio: Learning Car Radiance Fields From Single-View and Unlimited In-the-Wild Images

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts

S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

SIMS: Simulating Human-Scene Interactions with Real World Script Planning

An End-to-End Driver Simulator for Personal In-Vehicle Conversational Assistant

SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts

SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Immersive Traffic Interactive Simulator for Multi-Agent