BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Jonas Schramm,Niclas Vödisch,Kürsat Petek,B Ravi Kiran,Senthil Yogamani,Wolfram Burgard,Abhinav Valada
2024-07-25
Abstract:Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role in facilitating planning and decision-making for mobile robots. Although recent vision-only methods have demonstrated notable advancements in performance, they often struggle under adverse illumination conditions such as rain or nighttime. While active sensors offer a solution to this challenge, the prohibitively high cost of LiDARs remains a limiting factor. Fusing camera data with automotive radars poses a more inexpensive alternative but has received less attention in prior research. In this work, we aim to advance this promising avenue by introducing BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data, which is then leveraged to efficiently initialize the lifting of image features into the BEV space. We perform extensive experiments on the nuScenes dataset and demonstrate that BEVCar outperforms the current state of the art. Moreover, we show that incorporating radar information significantly enhances robustness in challenging environmental conditions and improves segmentation performance for distant objects. To foster future research, we provide the weather split of the nuScenes dataset used in our experiments, along with our code and trained models at <a class="link-external link-http" href="http://bevcar.cs.uni-freiburg.de" rel="external noopener nofollow">this http URL</a>.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the challenges of semantic scene segmentation from a bird's-eye view (BEV). Specifically, while existing vision-based methods have made significant progress in performance, they perform poorly under adverse lighting conditions (such as rainy or nighttime). Additionally, although active sensors (like LiDAR) can solve this problem, their high cost limits their widespread application. Therefore, fusing camera data with automotive radar data becomes a more economical alternative, but research in this direction is relatively scarce. To advance this research direction, the authors propose BEVCar, a novel approach for joint BEV object and map segmentation. The core innovation of BEVCar lies in first learning point representations of raw radar data and then efficiently lifting image features to the BEV space using these representations. Through extensive experiments on the nuScenes dataset, the authors demonstrate that BEVCar outperforms current state-of-the-art methods in performance and shows stronger robustness under adverse environmental conditions, especially when dealing with distant objects. ### Main Contributions 1. **Proposing BEVCar**: A novel approach for BEV map and object segmentation from camera and radar data. 2. **New Attention Mechanism**: Proposing an attention-based image lifting scheme that utilizes sparse radar points to initialize queries. 3. **Learning Radar Encoding**: Demonstrating that learning-based radar encoding outperforms using raw metadata. 4. **Performance Comparison**: Conducting extensive comparisons with previous baseline methods under adverse environmental conditions, showcasing the advantages of using radar measurements. 5. **Public Data and Code**: Providing the daytime, nighttime, and rainy weather splits used in the nuScenes dataset and releasing the code and trained models. ### Experimental Results 1. **Vehicle Segmentation Task**: BEVCar outperforms Simple-BEV (+2.7 IoU) in the vehicle segmentation task and performs comparably to BEVGuide (-0.8 IoU) and CRN (-0.4 IoU). It is noteworthy that CRN relies on LiDAR during training to learn depth metrics. 2. **Map Segmentation Task**: BEVCar outperforms all baseline methods in the map segmentation task and provides more semantic category information. In the comprehensive evaluation of both tasks, BEVCar achieves the highest performance, surpassing BEVGuide by +2.9 mIoU and CRN by +0.4 mIoU. 3. **Performance under Different Weather and Lighting Conditions**: BEVCar demonstrates stronger robustness and higher performance under different weather and lighting conditions. ### Conclusion By fusing camera and radar data, BEVCar significantly improves BEV map and object segmentation performance under adverse environmental conditions. This approach not only surpasses existing methods in performance but also offers greater economic feasibility and practicality in real-world applications.