GenAD: Generative End-to-End Autonomous Driving

Wenzhao Zheng,Ruiqi Song,Xianda Guo,Chenming Zhang,Long Chen
2024-04-07
Abstract:Directly producing planning results from raw sensors has been a long-desired solution for autonomous driving and has attracted increasing attention recently. Most existing end-to-end autonomous driving methods factorize this problem into perception, motion prediction, and planning. However, we argue that the conventional progressive pipeline still cannot comprehensively model the entire traffic evolution process, e.g., the future interaction between the ego car and other traffic participants and the structural trajectory prior. In this paper, we explore a new paradigm for end-to-end autonomous driving, where the key is to predict how the ego car and the surroundings evolve given past scenes. We propose GenAD, a generative framework that casts autonomous driving into a generative modeling problem. We propose an instance-centric scene tokenizer that first transforms the surrounding scenes into map-aware instance tokens. We then employ a variational autoencoder to learn the future trajectory distribution in a structural latent space for trajectory prior modeling. We further adopt a temporal model to capture the agent and ego movements in the latent space to generate more effective future trajectories. GenAD finally simultaneously performs motion prediction and planning by sampling distributions in the learned structural latent space conditioned on the instance tokens and using the learned temporal model to generate futures. Extensive experiments on the widely used nuScenes benchmark show that the proposed GenAD achieves state-of-the-art performance on vision-centric end-to-end autonomous driving with high efficiency. Code:
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the planning and prediction issues in end-to-end autonomous driving and proposes a new generative framework to improve the limitations of existing methods. Specifically, this paper proposes improvements for two main issues present in current end-to-end autonomous driving methods: 1. **Interactivity and Structural Issues**: Existing end-to-end autonomous driving methods typically decompose the problem into three stages: perception, motion prediction, and planning. This serial design overlooks potential future interactions between the vehicle and its surrounding traffic participants, as well as the structural characteristics of actual trajectories (e.g., trajectories are often continuous and smooth). 2. **Unified Planning and Prediction**: To overcome the above issues, the authors propose a new framework called GenAD (Generative End-to-End Autonomous Driving), which views autonomous driving as a generative problem. By introducing an instance-centric scene representation and a structured latent trajectory space, this method can simultaneously perform motion prediction and planning, thereby better simulating the complex interactions between the vehicle and the environment and considering the structural characteristics of actual trajectories. In short, the goal of this research is to improve the performance of end-to-end autonomous driving systems in motion prediction and planning by constructing a generative framework that comprehensively considers vehicle-environment interactions and trajectory structural characteristics.