Versatile Behavior Diffusion for Generalized Traffic Agent Simulation

Zhiyu Huang,Zixu Zhang,Ameya Vaidya,Yuxiao Chen,Chen Lv,Jaime Fernández Fisac
2024-12-03
Abstract:Existing traffic simulation models often fail to capture the complexities of real-world scenarios, limiting the effective evaluation of autonomous driving systems. We introduce Versatile Behavior Diffusion (VBD), a novel traffic scenario generation framework that utilizes diffusion generative models to predict scene-consistent and controllable multi-agent interactions in closed-loop settings. VBD achieves state-of-the-art performance on the Waymo Sim Agents Benchmark and can effectively produce realistic and coherent traffic behaviors with complex agent interactions under diverse environmental conditions. Furthermore, VBD offers inference-time scenario editing through multi-step refinement guided by behavior priors and model-based optimization objectives. This capability allows for controllable multi-agent behavior generation, accommodating a wide range of user requirements across various traffic simulation applications. Despite being trained solely on publicly available datasets representing typical traffic conditions, we introduce conflict-prior and game-theoretic guidance approaches that enable the creation of interactive, long-tail safety-critical scenarios, which is essential for comprehensive testing and validation of autonomous vehicles. Lastly, we provide in-depth insights into effective training and inference strategies for diffusion-based traffic scenario generation models, highlighting best practices and common pitfalls. Our work significantly advances the ability to simulate complex traffic environments, offering a powerful tool for the development and assessment of autonomous driving technologies.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to generate realistic and controllable multi - agent interaction behaviors in traffic simulations. Existing traffic simulation models are often unable to capture the complexity of real - world scenarios, which limits the ability to effectively evaluate autonomous driving systems. Specifically, traditional methods have the following deficiencies in simulation tests: 1. **Lack of interactivity**: Traditional replay methods (such as trajectory replay based on driving datasets) usually do not consider the reactions of other traffic participants, and thus cannot accurately simulate interactive scenarios such as pedestrians crossing the road or vehicles negotiating the right - of - way at four - way stops. This leads to a significant gap between simulation and reality, that is, the simulation test results may not reflect the actual performance. 2. **Model limitations**: Traditional heuristic - based methods are difficult to scale when dealing with complex real - world traffic operations (such as navigating around construction areas, pedestrians suddenly entering crosswalks, slowly moving forward in front of crowded intersections, etc.), and cannot comprehensively capture traffic dynamics. 3. **Limitations of generative models**: Although some recent studies use large - scale driving datasets and behavior cloning techniques to model traffic behaviors more accurately, these methods are still mainly focused on predicting the marginal open - loop trajectories of individual agents, which may lead to scene inconsistencies and even collisions, thus affecting the authenticity and reliability of the simulation scenes. To solve the above problems, this paper proposes a new traffic scene generation framework named **Versatile Behavior Diffusion (VBD)**. VBD uses diffusion generative models to predict scene - consistent and controllable multi - agent interaction behaviors, can generate realistic and coherent traffic behaviors in a closed - loop setting, and supports diverse environmental conditions. In addition, VBD also provides scene - editing functions at inference time, and realizes controllable multi - agent behavior generation through multi - step refinement and behavior priors and model optimization objective guidance, meeting the wide - ranging user needs in different traffic simulation applications. ### Main contributions: 1. **Propose the VBD model**: It can generate realistic and controllable traffic agent behaviors and has excellent closed - loop simulation performance. 2. **Demonstrate the flexibility of VBD**: Generate diverse user - specified scenes through a flexible guidance scheme (compatible with optimization objectives, behavior priors, and game - theoretic structures). 3. **In - depth study of training and inference strategies**: Through extensive empirical research, explore the impact of various training and inference settings for multi - agent behavior generation using diffusion models, providing valuable insights for future research. ### Technical details: - **Diffusion generative model**: VBD uses a diffusion model (also known as a scoring model) to gradually recover structured data from random noise and generate multi - agent interaction behaviors. - **Scene encoder**: Use a query - centered attention mechanism Transformer encoder to encode the scene context into a latent representation. - **Denoiser**: Directly predict the joint clean control sequence from the latent representation and the noise control sequence. - **Behavior predictor**: Predict the marginal categorical trajectory distribution of each agent from the latent representation, using representative static end - point anchors extracted from the data. In conclusion, this paper aims to solve the deficiencies of existing traffic simulation models in generating realistic, interactive, and controllable multi - agent behaviors through the VBD model, providing a powerful tool for the development and evaluation of autonomous driving systems.