Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes

Jiang-Tian Zhai,Ze Feng,Jinhao Du,Yongqiang Mao,Jiang-Jiang Liu,Zichang Tan,Yifu Zhang,Xiaoqing Ye,Jingdong Wang
2023-10-22
Abstract:Modern autonomous driving systems are typically divided into three main tasks: perception, prediction, and planning. The planning task involves predicting the trajectory of the ego vehicle based on inputs from both internal intention and the external environment, and manipulating the vehicle accordingly. Most existing works evaluate their performance on the nuScenes dataset using the L2 error and collision rate between the predicted trajectories and the ground truth. In this paper, we reevaluate these existing evaluation metrics and explore whether they accurately measure the superiority of different methods. Specifically, we design an MLP-based method that takes raw sensor data (e.g., past trajectory, velocity, etc.) as input and directly outputs the future trajectory of the ego vehicle, without using any perception or prediction information such as camera images or LiDAR. Our simple method achieves similar end-to-end planning performance on the nuScenes dataset with other perception-based methods, reducing the average L2 error by about 20%. Meanwhile, the perception-based methods have an advantage in terms of collision rate. We further conduct in-depth analysis and provide new insights into the factors that are critical for the success of the planning task on nuScenes dataset. Our observation also indicates that we need to rethink the current open-loop evaluation scheme of end-to-end autonomous driving in nuScenes. Codes are available at <a class="link-external link-https" href="https://github.com/E2E-AD/AD-MLP" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily discusses the open-loop evaluation methods for end-to-end autonomous driving systems on the nuScenes dataset and questions whether the existing evaluation metrics can accurately measure the merits and demerits of different approaches. The nuScenes dataset is a large-scale multimodal dataset commonly used in autonomous driving research, containing rich information on vehicles, pedestrians, and other dynamic elements. The authors designed a simple model based on a multi-layer perceptron (MLP) that only uses the physical state of the autonomous vehicle (such as past trajectory, speed, acceleration, etc.) as input, without relying on any perception information from cameras or LiDAR, such as images or point cloud data. This approach contrasts with most existing methods, which typically utilize complex perception tasks (such as 3D object detection and semantic segmentation) to acquire spatiotemporal information about the surrounding environment to aid in planning decisions. Through experiments, the authors found that even without using perception information, their model could achieve comparable end-to-end planning performance on the nuScenes dataset, specifically a reduction of about 20% in L2 error. However, perception-based methods performed better on the collision rate metric. This suggests that current evaluation metrics may not be sufficient to fully reflect the success factors of planning tasks, especially for those methods that do not directly rely on complex perception modules. Furthermore, the paper also analyzed the distribution of the autonomous vehicle states in the nuScenes dataset and found that the vehicles' motion trends in short time frames mainly focus on straight-line driving and small-angle turning, which may explain why using only physical state information can achieve good planning results. At the same time, the authors pointed out that the current evaluation scheme used to calculate collision rates (based on a certain grid size occupancy map) has flaws that could lead to false positives, particularly when vehicles are close to smaller objects. In summary, the core contribution of the paper is to re-examine the evaluation standards for end-to-end autonomous driving, suggesting a need to rethink the existing open-loop evaluation schemes, and emphasizing that future research should focus more on in-depth analysis of planning tasks to promote the safety and efficiency of autonomous driving technology.