DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception

Kai Jiang,Jiaxing Huang,Weiying Xie,Yunsong Li,Ling Shao,Shijian Lu
2024-08-13
Abstract:Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features. DA-BEV introduces the idea of query into the domain adaptation framework to derive useful information from image-view and BEV features. It consists of two query-based designs, namely, query-based adversarial learning (QAL) and query-based self-training (QST), which exploits image-view features or BEV features to regularize the adaptation of the other. Extensive experiments show that DA-BEV achieves superior domain adaptive BEV perception performance consistently across multiple datasets and tasks such as 3D object detection and 3D scene segmentation.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the adaptation problem of Bird's Eye View (BEV) perception with only camera data across different data domains, especially achieving this goal in an unsupervised manner. Specifically, the research focuses on how to enable a BEV perception model trained on one dataset to perform well on another unseen dataset, even if the latter lacks annotation information. To solve the aforementioned problem, the authors propose a new framework named DA-BEV. This framework alleviates cross-domain differences by leveraging the complementarity between image view features (extracted from multi-camera images) and BEV features (which fuse image view features and camera configuration information). Specifically, the method introduces a learnable query mechanism to facilitate the interaction between image view features and BEV features and designs two query-based methods: Query-based Adversarial Learning (QAL) and Query-based Self-Training (QST). These two methods utilize information from one type of feature to regularize the training process of the other type of feature. - **Query-based Adversarial Learning (QAL)**: Utilizes useful information from image view features or BEV features to regulate the adversarial learning process of the other type of feature. - **Query-based Self-Training (QST)**: Uses information from image view features and BEV features to guide its self-training process, thereby improving the model's adaptability to target domain data. Through these techniques, DA-BEV demonstrates significant performance improvements across different datasets and tasks, especially in tasks such as 3D object detection and 3D scene segmentation. Moreover, experimental results show that in various challenging cross-domain adaptation scenarios, such as different lighting conditions, weather conditions, and urban environments, DA-BEV achieves significant progress compared to baselines and other advanced domain adaptation methods.