Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

Yuxi Wei,Zi Wang,Yifan Lu,Chenxin Xu,Changxing Liu,Hao Zhao,Siheng Chen,Yanfeng Wang
2024-06-26
Abstract:Scene simulation in autonomous driving has gained significant attention because of its huge potential for generating customized data. However, existing editable scene simulation approaches face limitations in terms of user interaction efficiency, multi-camera photo-realistic rendering and external digital assets integration. To address these challenges, this paper introduces ChatSim, the first system that enables editable photo-realistic 3D driving scene simulations via natural language commands with external digital assets. To enable editing with high command flexibility,~ChatSim leverages a large language model (LLM) agent collaboration framework. To generate photo-realistic outcomes, ChatSim employs a novel multi-camera neural radiance field method. Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering. Our experiments on Waymo Open Dataset demonstrate that ChatSim can handle complex language commands and generate corresponding photo-realistic scene videos.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve three key problems in autonomous driving scenario simulation: 1. **Complex user interaction requirements**: Existing editable scenario simulation methods are less efficient in handling complex or abstract user commands and cannot generate customized driving scenarios efficiently. 2. **Multi - view photo - realistic rendering**: Existing methods have deficiencies in photo - realistic rendering in multi - camera settings, especially when dealing with different exposure times and inconsistent camera poses. 3. **Integration of external digital assets**: Existing simulation methods have difficulty seamlessly integrating high - quality external digital assets (such as 3D models, textures, etc.) into the scene to meet customization requirements. To solve these problems, the paper introduces the **ChatSim** system. ChatSim enables editable photo - realistic 3D driving scenario simulation through natural language commands and can integrate external digital assets. Specifically: - **Natural - language - command - driven editing**: ChatSim utilizes a large - language - model (LLM) agent collaboration framework, allowing users to easily edit the scene through natural - language commands without writing code or performing complex intermediate steps. - **Multi - camera Neural Radiance Field (McNeRF)**: To achieve photo - realistic rendering, ChatSim proposes the McNeRF method, which takes into account the different exposure times of multi - camera inputs and solves the problems of inconsistent brightness and camera - pose mismatching. - **Multi - camera illumination estimation (McLight)**: To make external digital assets consistent with the scene illumination conditions, ChatSim proposes a new multi - camera illumination estimation method McLight, which combines sky - dome illumination and surrounding illumination estimation to ensure seamless integration of external assets and realistic shadow effects. Through these innovations, ChatSim can generate photo - realistic scene videos on the Waymo Open Dataset that conform to various human - language commands, thus providing a powerful tool for the testing of autonomous driving perception systems.