Abstract:Existing traffic simulation models often fail to capture the complexities of real-world scenarios, limiting the effective evaluation of autonomous driving systems. We introduce Versatile Behavior Diffusion (VBD), a novel traffic scenario generation framework that utilizes diffusion generative models to predict scene-consistent and controllable multi-agent interactions in closed-loop settings. VBD achieves state-of-the-art performance on the Waymo Sim Agents Benchmark and can effectively produce realistic and coherent traffic behaviors with complex agent interactions under diverse environmental conditions. Furthermore, VBD offers inference-time scenario editing through multi-step refinement guided by behavior priors and model-based optimization objectives. This capability allows for controllable multi-agent behavior generation, accommodating a wide range of user requirements across various traffic simulation applications. Despite being trained solely on publicly available datasets representing typical traffic conditions, we introduce conflict-prior and game-theoretic guidance approaches that enable the creation of interactive, long-tail safety-critical scenarios, which is essential for comprehensive testing and validation of autonomous vehicles. Lastly, we provide in-depth insights into effective training and inference strategies for diffusion-based traffic scenario generation models, highlighting best practices and common pitfalls. Our work significantly advances the ability to simulate complex traffic environments, offering a powerful tool for the development and assessment of autonomous driving technologies.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to generate realistic and controllable multi - agent interaction behaviors in traffic simulations. Existing traffic simulation models are often unable to capture the complexity of real - world scenarios, which limits the ability to effectively evaluate autonomous driving systems. Specifically, traditional methods have the following deficiencies in simulation tests: 1. **Lack of interactivity**: Traditional replay methods (such as trajectory replay based on driving datasets) usually do not consider the reactions of other traffic participants, and thus cannot accurately simulate interactive scenarios such as pedestrians crossing the road or vehicles negotiating the right - of - way at four - way stops. This leads to a significant gap between simulation and reality, that is, the simulation test results may not reflect the actual performance. 2. **Model limitations**: Traditional heuristic - based methods are difficult to scale when dealing with complex real - world traffic operations (such as navigating around construction areas, pedestrians suddenly entering crosswalks, slowly moving forward in front of crowded intersections, etc.), and cannot comprehensively capture traffic dynamics. 3. **Limitations of generative models**: Although some recent studies use large - scale driving datasets and behavior cloning techniques to model traffic behaviors more accurately, these methods are still mainly focused on predicting the marginal open - loop trajectories of individual agents, which may lead to scene inconsistencies and even collisions, thus affecting the authenticity and reliability of the simulation scenes. To solve the above problems, this paper proposes a new traffic scene generation framework named **Versatile Behavior Diffusion (VBD)**. VBD uses diffusion generative models to predict scene - consistent and controllable multi - agent interaction behaviors, can generate realistic and coherent traffic behaviors in a closed - loop setting, and supports diverse environmental conditions. In addition, VBD also provides scene - editing functions at inference time, and realizes controllable multi - agent behavior generation through multi - step refinement and behavior priors and model optimization objective guidance, meeting the wide - ranging user needs in different traffic simulation applications. ### Main contributions: 1. **Propose the VBD model**: It can generate realistic and controllable traffic agent behaviors and has excellent closed - loop simulation performance. 2. **Demonstrate the flexibility of VBD**: Generate diverse user - specified scenes through a flexible guidance scheme (compatible with optimization objectives, behavior priors, and game - theoretic structures). 3. **In - depth study of training and inference strategies**: Through extensive empirical research, explore the impact of various training and inference settings for multi - agent behavior generation using diffusion models, providing valuable insights for future research. ### Technical details: - **Diffusion generative model**: VBD uses a diffusion model (also known as a scoring model) to gradually recover structured data from random noise and generate multi - agent interaction behaviors. - **Scene encoder**: Use a query - centered attention mechanism Transformer encoder to encode the scene context into a latent representation. - **Denoiser**: Directly predict the joint clean control sequence from the latent representation and the noise control sequence. - **Behavior predictor**: Predict the marginal categorical trajectory distribution of each agent from the latent representation, using representative static end - point anchors extracted from the data. In conclusion, this paper aims to solve the deficiencies of existing traffic simulation models in generating realistic, interactive, and controllable multi - agent behaviors through the VBD model, providing a powerful tool for the development and evaluation of autonomous driving systems.

Versatile Behavior Diffusion for Generalized Traffic Agent Simulation

Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion

Data-driven Diffusion Models for Enhancing Safety in Autonomous Vehicle Traffic Simulations

Enhanced Multimodal Trajectory Prediction for Autonomous Vehicles Using Advanced Diffusion Model Techniques

TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction

DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing

TrajGen: Generating Realistic and Diverse Trajectories With Reactive and Feasible Agent Behaviors for Autonomous Driving

DragTraffic: Interactive and Controllable Traffic Scene Generation for Autonomous Driving

SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries

TrafficGamer: Reliable and Flexible Traffic Simulation for Safety-Critical Scenarios with Game-Theoretic Oracles

TrafficGen: Learning to Generate Diverse and Realistic Traffic Scenarios.

AA‐FVDM: an Accident‐avoidance Full Velocity Difference Model for Animating Realistic Street‐level Traffic in Rural Scenes

InterSim: Interactive Traffic Simulation Via Explicit Relation Modeling

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

Controllable Traffic Simulation through LLM-Guided Hierarchical Chain-of-Thought Reasoning

WcDT: World-centric Diffusion Transformer for Traffic Scene Generation

A Diffusion-Model of Joint Interactive Navigation

SceneDM: Scene-level Multi-agent Trajectory Generation with Consistent Diffusion Models

SurrealDriver: Designing Generative Driver Agent Simulation Framework in Urban Contexts based on Large Language Model

Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting

Immersive Traffic Interactive Simulator for Multi-Agent