Traffic Scene Generation from Natural Language Description for Autonomous Vehicles with Large Language Model

Bo-Kai Ruan,Hao-Tang Tsui,Yung-Hui Li,Hong-Han Shuai
2024-09-15
Abstract:Text-to-scene generation, transforming textual descriptions into detailed scenes, typically relies on generating key scenarios along predetermined paths, constraining environmental diversity and limiting customization flexibility. To address these limitations, we propose a novel text-to-traffic scene framework that leverages a large language model to generate diverse traffic scenarios within the Carla simulator based on natural language descriptions. Users can define specific parameters such as weather conditions, vehicle types, and road signals, while our pipeline can autonomously select the starting point and scenario details, generating scenes from scratch without relying on predetermined locations or trajectories. Furthermore, our framework supports both critical and routine traffic scenarios, enhancing its applicability. Experimental results indicate that our approach promotes diverse agent planning and road selection, enhancing the training of autonomous agents in traffic environments. Notably, our methodology has achieved a 16% reduction in average collision rates. Our work is made publicly available at <a class="link-external link-https" href="https://basiclab.github.io/TTSG" rel="external noopener nofollow">this https URL</a>.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in autonomous vehicles, the ability to generate traffic scenes through natural - language descriptions is limited. Existing methods usually rely on predefined paths and locations, which restricts the diversity of the environment and the flexibility of customization. To overcome these limitations, the authors propose a new text - to - traffic - scene generation framework based on large - language models (LLMs). This framework can generate diverse traffic scenes in the Carla simulator according to natural - language descriptions. Users can define specific parameters such as weather conditions, vehicle types, and road signals, etc., and the system can independently select starting points and scene details without relying on predefined locations or trajectories. In addition, this framework supports critical and regular traffic scenes, enhancing its applicability, and the experimental results show that this method promotes diverse agent planning and road selection, improves the training effect of autonomous agents in traffic environments, especially stands out in reducing the collision rate, achieving an average 16% reduction in the collision rate.