Abstract:Open-world autonomous driving encompasses domain generalization and open-vocabulary. Domain generalization refers to the capabilities of autonomous driving systems across different scenarios and sensor parameter configurations. Open vocabulary pertains to the ability to recognize various semantic categories not encountered during training. In this paper, we introduce OpenAD, the first real-world open-world autonomous driving benchmark for 3D object detection. OpenAD is built on a corner case discovery and annotation pipeline integrating with a multimodal large language model (MLLM). The proposed pipeline annotates corner case objects in a unified format for five autonomous driving perception datasets with 2000 scenarios. In addition, we devise evaluation methodologies and evaluate various 2D and 3D open-world and specialized models. Moreover, we propose a vision-centric 3D open-world object detection baseline and further introduce an ensemble method by fusing general and specialized models to address the issue of lower precision in existing open-world methods for the OpenAD benchmark. Annotations, toolkit code, and all evaluation codes will be released.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the 3D object detection challenges in open - world autonomous driving. Specifically, the paper focuses on two key issues: **domain generalization** and **open - vocabulary**: 1. **Domain generalization**: It refers to the performance of the autonomous driving system under different scenarios and sensor parameter configurations. Existing models perform poorly when encountering unseen scenarios, which limits their reliability and robustness in practical applications. 2. **Open - vocabulary**: It refers to the model's ability to recognize semantic categories that have not been encountered during the training process. This is crucial for subsequent reasoning and planning, such as determining whether an object is collidable, whether it will move suddenly, or whether it indicates that certain areas are impassable. To solve these problems, the authors propose **OpenAD** - an open - world autonomous driving benchmark dataset for 3D object detection. The main features of OpenAD include: - **Richly annotated data**: It contains 2,000 scenes from five autonomous driving perception datasets, with thousands of corner - case objects annotated. - **Multi - modal large language model (MLLM) - integrated annotation pipeline**: It is used for automatically identifying and annotating corner - case objects. - **Evaluation methods**: New evaluation metrics are designed to comprehensively evaluate the model's domain generalization ability and open - vocabulary ability. Through these efforts, OpenAD aims to fill the gaps in existing 3D perception datasets and provide a more comprehensive and challenging benchmark to promote the development of open - world autonomous driving technology. ### Formula summary The formulas involved in the paper are mainly used for the calculation of evaluation metrics, such as: - Calculation of **Average Precision (AP)** and **Average Recall (AR)**: \[ \text{AP}=\frac{\sum_{i = 1}^{N}\text{TP}_i}{\sum_{i = 1}^{N}(\text{TP}_i+\text{FP}_i)} \] \[ \text{AR}=\frac{\sum_{i = 1}^{N}\text{TP}_i}{\sum_{i = 1}^{N}(\text{TP}_i+\text{FN}_i)} \] where $\text{TP}$ represents true positive, $\text{FP}$ represents false positive, and $\text{FN}$ represents false negative. - **Position threshold and semantic similarity threshold**: - For 2D object detection, the Intersection over Union (IoU) is used as the position score, and the cosine similarity is used as the semantic score. - For 3D object detection, the center distance is used as the position score, and the cosine similarity is also used as the semantic score. These formulas ensure a comprehensive evaluation of the model's performance, especially its performance when dealing with unseen categories and scenarios.

OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection

Sniffing Threatening Open-World Objects in Autonomous Driving by Open-Vocabulary Models

Rethinking Open-World Object Detection in Autonomous Driving Scenarios

OpenMPD: An Open Multimodal Perception Dataset for Autonomous Driving

Open 3D World in Autonomous Driving

Open-Scenario Domain Adaptive Object Detection in Autonomous Driving

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving

3D Object Detection for Autonomous Driving: A Comprehensive Survey

OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

Opening up Open-World Tracking

Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking

BLVD: Building A Large-scale 5D Semantics Benchmark for Autonomous Driving

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving

OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving

Open-CRB: Towards Open World Active Learning for 3D Object Detection

End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation