Abstract:Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features. DA-BEV introduces the idea of query into the domain adaptation framework to derive useful information from image-view and BEV features. It consists of two query-based designs, namely, query-based adversarial learning (QAL) and query-based self-training (QST), which exploits image-view features or BEV features to regularize the adaptation of the other. Extensive experiments show that DA-BEV achieves superior domain adaptive BEV perception performance consistently across multiple datasets and tasks such as 3D object detection and 3D scene segmentation.

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the adaptation problem of Bird's Eye View (BEV) perception with only camera data across different data domains, especially achieving this goal in an unsupervised manner. Specifically, the research focuses on how to enable a BEV perception model trained on one dataset to perform well on another unseen dataset, even if the latter lacks annotation information. To solve the aforementioned problem, the authors propose a new framework named DA-BEV. This framework alleviates cross-domain differences by leveraging the complementarity between image view features (extracted from multi-camera images) and BEV features (which fuse image view features and camera configuration information). Specifically, the method introduces a learnable query mechanism to facilitate the interaction between image view features and BEV features and designs two query-based methods: Query-based Adversarial Learning (QAL) and Query-based Self-Training (QST). These two methods utilize information from one type of feature to regularize the training process of the other type of feature. - **Query-based Adversarial Learning (QAL)**: Utilizes useful information from image view features or BEV features to regulate the adversarial learning process of the other type of feature. - **Query-based Self-Training (QST)**: Uses information from image view features and BEV features to guide its self-training process, thereby improving the model's adaptability to target domain data. Through these techniques, DA-BEV demonstrates significant performance improvements across different datasets and tasks, especially in tasks such as 3D object detection and 3D scene segmentation. Moreover, experimental results show that in various challenging cross-domain adaptation scenarios, such as different lighting conditions, weather conditions, and urban environments, DA-BEV achieves significant progress compared to baselines and other advanced domain adaptation methods.

DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation

BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection

Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving

SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

DA-BEV: Depth Aware BEV Transformer for 3D Object Detection

Bird’s Eye View Perception for Autonomous Driving

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe

BEVScope: Enhancing Self-Supervised Depth Estimation Leveraging Bird's-Eye-View in Dynamic Scenarios

Delving Into the Devils of Bird’s-Eye-View Perception: A Review, Evaluation and Recipe

DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

Vision-Driven 2D Supervised Fine-Tuning Framework for Bird's Eye View Perception

BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection