End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation

Mingzhe Guo,Zhipeng Zhang,Yuan He,Ke Wang,Liping Jing
2024-06-26
Abstract:We propose UAD, a method for vision-based end-to-end autonomous driving (E2EAD), achieving the best open-loop evaluation performance in nuScenes, meanwhile showing robust closed-loop driving quality in CARLA. Our motivation stems from the observation that current E2EAD models still mimic the modular architecture in typical driving stacks, with carefully designed supervised perception and prediction subtasks to provide environment information for oriented planning. Although achieving groundbreaking progress, such design has certain drawbacks: 1) preceding subtasks require massive high-quality 3D annotations as supervision, posing a significant impediment to scaling the training data; 2) each submodule entails substantial computation overhead in both training and inference. To this end, we propose UAD, an E2EAD framework with an unsupervised proxy to address all these issues. Firstly, we design a novel Angular Perception Pretext to eliminate the annotation requirement. The pretext models the driving scene by predicting the angular-wise spatial objectness and temporal dynamics, without manual annotation. Secondly, a self-supervised training strategy, which learns the consistency of the predicted trajectories under different augment views, is proposed to enhance the planning robustness in steering scenarios. Our UAD achieves 38.7% relative improvements over UniAD on the average collision rate in nuScenes and surpasses VAD for 41.32 points on the driving score in CARLA's Town05 Long benchmark. Moreover, the proposed method only consumes 44.3% training resources of UniAD and runs 3.4 times faster in inference. Our innovative design not only for the first time demonstrates unarguable performance advantages over supervised counterparts, but also enjoys unprecedented efficiency in data, training, and inference. Code and models will be released at <a class="link-external link-https" href="https://github.com/KargoBot_Research/UAD" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address several key issues in end-to-end autonomous driving (E2EAD): 1. **High-cost modular design**: Current E2EAD models typically mimic traditional driving stack architectures by providing environmental information through carefully designed supervised perception and prediction sub-tasks to support planning. While this design has achieved breakthrough progress, it has the following drawbacks: - **High annotation demand**: The pre-tasks require a large amount of high-quality 3D annotations for supervision, which becomes a significant barrier to scaling up training data. - **High computational overhead**: Each sub-module requires substantial computational resources during training and inference. 2. **Dependence on 3D annotations**: Existing E2EAD methods heavily rely on 3D annotations, which not only increases the cost of data annotation but also limits the utilization of large-scale data. ### Solution To address the above issues, the authors propose UAD (Unsupervised Autonomous Driving), a vision-based end-to-end autonomous driving framework with the following main features: 1. **Unsupervised pre-tasks**: UAD introduces an unsupervised pre-task that eliminates the need for 3D annotations. Specifically, this pre-task models the driving scene by predicting the objectness and temporal dynamics of each sector in the BEV space without manual annotations. 2. **Self-supervised training strategy**: To enhance the robustness of planning in steering scenarios, UAD proposes a self-supervised training strategy that learns the consistency of predicted trajectories under different augmented views. 3. **Efficient computation and data utilization**: UAD significantly reduces the computational resource consumption during training and inference and completely eliminates the need for 3D annotations, thereby improving data utilization efficiency. ### Experimental Results - **Open-loop evaluation**: On the nuScenes dataset, UAD improves the average collision rate by 38.7% compared to UniAD and by 55.2% compared to VAD. - **Closed-loop evaluation**: In the Town05 Long benchmark test of the CARLA simulator, UAD's driving score is 41.32 points higher than VAD, and the route completion rate is 19.24% higher. ### Contributions 1. **Unsupervised pre-tasks**: UAD eliminates the need for 3D annotations through unsupervised pre-tasks, allowing training data to scale to billions without additional annotation burdens. 2. **Direction-aware self-supervised learning strategy**: A new self-supervised direction-aware learning strategy is introduced, enhancing planning robustness by maximizing the consistency of predicted trajectories under different augmented views. 3. **Superior performance**: UAD demonstrates superior performance in both open-loop and closed-loop evaluations compared to other vision-based E2EAD methods, with lower computational and annotation costs.