Abstract:Autonomous driving requires a comprehensive understanding of the surrounding environment for reliable trajectory planning. Previous works rely on dense rasterized scene representation (e.g., agent occupancy and semantic map) to perform planning, which is computationally intensive and misses the instance-level structure information. In this paper, we propose VAD, an end-to-end vectorized paradigm for autonomous driving, which models the driving scene as a fully vectorized representation. The proposed vectorized paradigm has two significant advantages. On one hand, VAD exploits the vectorized agent motion and map elements as explicit instance-level planning constraints which effectively improves planning safety. On the other hand, VAD runs much faster than previous end-to-end planning methods by getting rid of computation-intensive rasterized representation and hand-designed post-processing steps. VAD achieves state-of-the-art end-to-end planning performance on the nuScenes dataset, outperforming the previous best method by a large margin. Our base model, VAD-Base, greatly reduces the average collision rate by 29.0% and runs 2.5x faster. Besides, a lightweight variant, VAD-Tiny, greatly improves the inference speed (up to 9.3x) while achieving comparable planning performance. We believe the excellent performance and the high efficiency of VAD are critical for the real-world deployment of an autonomous driving system. Code and models are available at <a class="link-external link-https" href="https://github.com/hustvl/VAD" rel="external noopener nofollow">this https URL</a> for facilitating future research.

What problem does this paper attempt to address?

The paper mainly targets the problem of scene understanding and planning in autonomous driving, proposing a new method named VAD (Vectorized Autonomous Driving). Traditionally, autonomous driving systems adopt a modular approach, separating perception and planning, but this method has the issue that the planning module cannot directly access the raw sensor data, leading to reduced planning safety. In recent years, end-to-end autonomous driving methods have begun to emerge, which attempt to perceive directly from sensor data and output planning results, but most methods still rely on dense grid-based scene representations, which are not only computationally intensive but also lose instance-level structural information. VAD proposes a vector-based end-to-end autonomous driving paradigm, which completely abandons the computationally intensive grid-based representation and instead adopts a vectorized scene representation, including vectorized agent motion and map elements. This approach has two significant advantages: on one hand, VAD utilizes vectorized agent motion and map elements as explicit instance-level planning constraints, effectively improving planning safety; on the other hand, by avoiding computationally intensive grid-based representations and manually designed post-processing steps, VAD operates much faster than previous end-to-end planning methods. Specifically, VAD fully leverages vectorized information during the planning phase, improving planning safety through querying interactions and vectorized planning constraints. It introduces three types of instance-level planning constraints: self-agent collision constraints, self-boundary crossing constraints, and self-lane direction constraints, which are used to maintain a safe distance between the self-vehicle and other dynamic agents, prevent the vehicle from approaching road boundaries, and regulate the vehicle's future driving direction to conform to lane directions, respectively. Experimental results show that VAD achieves state-of-the-art end-to-end planning performance on the nuScenes dataset, with a 29.0% reduction in average collision rate and a 30.1% reduction in average planning displacement error compared to the previous best method, while also achieving a 2.5-fold increase in operating speed. In addition, a lightweight variant of VAD, VAD-Tiny, maintains comparable planning performance while increasing inference speed by 9.3 times. In summary, VAD demonstrates the tremendous potential of vectorized scene representation in improving the planning performance and efficiency of autonomous driving systems, providing key support for practical deployment.

VAD: Vectorized Scene Representation for Efficient Autonomous Driving

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation

GenAD: Generative End-to-End Autonomous Driving

VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation

SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

Planning-oriented Autonomous Driving

GaussianAD: Gaussian-Centric End-to-End Autonomous Driving

VLP: Vision Language Planning for Autonomous Driving

SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation

Mitigating Causal Confusion in Vector-Based Behavior Cloning for Safer Autonomous Planning

VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

Real-time path planning for autonomous vehicle off-road driving