VAD: Vectorized Scene Representation for Efficient Autonomous Driving

Bo Jiang,Shaoyu Chen,Qing Xu,Bencheng Liao,Jiajie Chen,Helong Zhou,Qian Zhang,Wenyu Liu,Chang Huang,Xinggang Wang
2023-08-24
Abstract:Autonomous driving requires a comprehensive understanding of the surrounding environment for reliable trajectory planning. Previous works rely on dense rasterized scene representation (e.g., agent occupancy and semantic map) to perform planning, which is computationally intensive and misses the instance-level structure information. In this paper, we propose VAD, an end-to-end vectorized paradigm for autonomous driving, which models the driving scene as a fully vectorized representation. The proposed vectorized paradigm has two significant advantages. On one hand, VAD exploits the vectorized agent motion and map elements as explicit instance-level planning constraints which effectively improves planning safety. On the other hand, VAD runs much faster than previous end-to-end planning methods by getting rid of computation-intensive rasterized representation and hand-designed post-processing steps. VAD achieves state-of-the-art end-to-end planning performance on the nuScenes dataset, outperforming the previous best method by a large margin. Our base model, VAD-Base, greatly reduces the average collision rate by 29.0% and runs 2.5x faster. Besides, a lightweight variant, VAD-Tiny, greatly improves the inference speed (up to 9.3x) while achieving comparable planning performance. We believe the excellent performance and the high efficiency of VAD are critical for the real-world deployment of an autonomous driving system. Code and models are available at <a class="link-external link-https" href="https://github.com/hustvl/VAD" rel="external noopener nofollow">this https URL</a> for facilitating future research.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper mainly targets the problem of scene understanding and planning in autonomous driving, proposing a new method named VAD (Vectorized Autonomous Driving). Traditionally, autonomous driving systems adopt a modular approach, separating perception and planning, but this method has the issue that the planning module cannot directly access the raw sensor data, leading to reduced planning safety. In recent years, end-to-end autonomous driving methods have begun to emerge, which attempt to perceive directly from sensor data and output planning results, but most methods still rely on dense grid-based scene representations, which are not only computationally intensive but also lose instance-level structural information. VAD proposes a vector-based end-to-end autonomous driving paradigm, which completely abandons the computationally intensive grid-based representation and instead adopts a vectorized scene representation, including vectorized agent motion and map elements. This approach has two significant advantages: on one hand, VAD utilizes vectorized agent motion and map elements as explicit instance-level planning constraints, effectively improving planning safety; on the other hand, by avoiding computationally intensive grid-based representations and manually designed post-processing steps, VAD operates much faster than previous end-to-end planning methods. Specifically, VAD fully leverages vectorized information during the planning phase, improving planning safety through querying interactions and vectorized planning constraints. It introduces three types of instance-level planning constraints: self-agent collision constraints, self-boundary crossing constraints, and self-lane direction constraints, which are used to maintain a safe distance between the self-vehicle and other dynamic agents, prevent the vehicle from approaching road boundaries, and regulate the vehicle's future driving direction to conform to lane directions, respectively. Experimental results show that VAD achieves state-of-the-art end-to-end planning performance on the nuScenes dataset, with a 29.0% reduction in average collision rate and a 30.1% reduction in average planning displacement error compared to the previous best method, while also achieving a 2.5-fold increase in operating speed. In addition, a lightweight variant of VAD, VAD-Tiny, maintains comparable planning performance while increasing inference speed by 9.3 times. In summary, VAD demonstrates the tremendous potential of vectorized scene representation in improving the planning performance and efficiency of autonomous driving systems, providing key support for practical deployment.