Abstract:Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role in facilitating planning and decision-making for mobile robots. Although recent vision-only methods have demonstrated notable advancements in performance, they often struggle under adverse illumination conditions such as rain or nighttime. While active sensors offer a solution to this challenge, the prohibitively high cost of LiDARs remains a limiting factor. Fusing camera data with automotive radars poses a more inexpensive alternative but has received less attention in prior research. In this work, we aim to advance this promising avenue by introducing BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data, which is then leveraged to efficiently initialize the lifting of image features into the BEV space. We perform extensive experiments on the nuScenes dataset and demonstrate that BEVCar outperforms the current state of the art. Moreover, we show that incorporating radar information significantly enhances robustness in challenging environmental conditions and improves segmentation performance for distant objects. To foster future research, we provide the weather split of the nuScenes dataset used in our experiments, along with our code and trained models at <a class="link-external link-http" href="http://bevcar.cs.uni-freiburg.de" rel="external noopener nofollow">this http URL</a>.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper aims to address the challenges of semantic scene segmentation from a bird's-eye view (BEV). Specifically, while existing vision-based methods have made significant progress in performance, they perform poorly under adverse lighting conditions (such as rainy or nighttime). Additionally, although active sensors (like LiDAR) can solve this problem, their high cost limits their widespread application. Therefore, fusing camera data with automotive radar data becomes a more economical alternative, but research in this direction is relatively scarce. To advance this research direction, the authors propose BEVCar, a novel approach for joint BEV object and map segmentation. The core innovation of BEVCar lies in first learning point representations of raw radar data and then efficiently lifting image features to the BEV space using these representations. Through extensive experiments on the nuScenes dataset, the authors demonstrate that BEVCar outperforms current state-of-the-art methods in performance and shows stronger robustness under adverse environmental conditions, especially when dealing with distant objects. ### Main Contributions 1. **Proposing BEVCar**: A novel approach for BEV map and object segmentation from camera and radar data. 2. **New Attention Mechanism**: Proposing an attention-based image lifting scheme that utilizes sparse radar points to initialize queries. 3. **Learning Radar Encoding**: Demonstrating that learning-based radar encoding outperforms using raw metadata. 4. **Performance Comparison**: Conducting extensive comparisons with previous baseline methods under adverse environmental conditions, showcasing the advantages of using radar measurements. 5. **Public Data and Code**: Providing the daytime, nighttime, and rainy weather splits used in the nuScenes dataset and releasing the code and trained models. ### Experimental Results 1. **Vehicle Segmentation Task**: BEVCar outperforms Simple-BEV (+2.7 IoU) in the vehicle segmentation task and performs comparably to BEVGuide (-0.8 IoU) and CRN (-0.4 IoU). It is noteworthy that CRN relies on LiDAR during training to learn depth metrics. 2. **Map Segmentation Task**: BEVCar outperforms all baseline methods in the map segmentation task and provides more semantic category information. In the comprehensive evaluation of both tasks, BEVCar achieves the highest performance, surpassing BEVGuide by +2.9 mIoU and CRN by +0.4 mIoU. 3. **Performance under Different Weather and Lighting Conditions**: BEVCar demonstrates stronger robustness and higher performance under different weather and lighting conditions. ### Conclusion By fusing camera and radar data, BEVCar significantly improves BEV map and object segmentation performance under adverse environmental conditions. This approach not only surpasses existing methods in performance but also offers greater economic feasibility and practicality in real-world applications.

BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

BEV-Radar: Bidirectional Radar-Camera Fusion for 3D Object Detection

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network

RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Traffic Object Detection for Autonomous Driving Fusing LiDAR and Pseudo 4D-Radar under Bird’s-Eye-View

MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with Bird's Eye View based Appearance and Motion Features

Bridging the View Disparity Between Radar and Camera Features for Multi-Modal Fusion 3D Object Detection

RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection

Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception