Abstract:Current research in semantic bird's-eye view segmentation for autonomous driving focuses solely on optimizing neural network models using a single dataset, typically nuScenes. This practice leads to the development of highly specialized models that may fail when faced with different environments or sensor setups, a problem known as domain shift. In this paper, we conduct a comprehensive cross-dataset evaluation of state-of-the-art BEV segmentation models to assess their performance across different training and testing datasets and setups, as well as different semantic categories. We investigate the influence of different sensors, such as cameras and LiDAR, on the models' ability to generalize to diverse conditions and scenarios. Additionally, we conduct multi-dataset training experiments that improve models' BEV segmentation performance compared to single-dataset training. Our work addresses the gap in evaluating BEV segmentation models under cross-dataset validation. And our findings underscore the importance of enhancing model generalizability and adaptability to ensure more robust and reliable BEV segmentation approaches for autonomous driving applications. The code for this paper available at <a class="link-external link-https" href="https://github.com/manueldiaz96/beval" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that current Bird - Eye - View (BEV) semantic segmentation models in the field of autonomous driving rely only on a single dataset for optimization and evaluation, which may cause these models to fail when facing different environments or sensor settings, that is, there is a domain shift problem. Specifically: 1. **Domain shift problem**: Most existing BEV segmentation models are trained and tested using the nuScenes single dataset, which makes the models perform poorly when encountering data with different environments or sensor configurations. 2. **Insufficient generalization ability of models**: Due to the lack of cross - dataset verification, the generalization ability of these models has not been fully evaluated, thus affecting their reliability and robustness in practical applications. To solve these problems, the author proposes a cross - dataset evaluation framework, aiming to evaluate the performance of BEV segmentation models on different training and testing datasets, different sensor configurations, and different semantic categories. In this way, researchers can more comprehensively understand the generalization ability and adaptability of the models and ensure that the models are more reliable in diverse real - world scenarios. ### Main contributions 1. **Propose a cross - dataset verification framework for the first time**: This framework can be extended to more models, datasets, and semantic categories, providing a flexible and general evaluation method. 2. **Multi - dataset experiments**: Three state - of - the - art BEV segmentation models are evaluated using two large - scale real - world datasets (nuScenes and Woven Planet), covering different input sensor modalities and three semantic segmentation categories. 3. **Multi - dataset training**: The effect of training models on multiple datasets simultaneously is studied to improve the generalization ability of the models. ### Method overview - **Dataset selection**: The nuScenes and Woven Planet datasets are selected because they have similar sensor configurations and are highly representative in the BEV segmentation field. - **Point cloud processing**: To unify the point cloud density differences between the two datasets, the Woven Planet dataset is down - sampled. - **Image processing**: The size and pre - processing methods of the images in the two datasets are adjusted to ensure consistency. - **Model selection**: Three different types of BEV segmentation models are selected: the LSS model that uses only cameras, the LAPT model that fuses cameras and LiDAR early, and the LAPT - PP model that fuses cameras and LiDAR late. - **Experimental design**: Models are trained on single datasets and multiple datasets respectively, and tested on different datasets to evaluate the cross - dataset generalization ability of the models. Through these methods, the author reveals the performance differences of existing models on different datasets and emphasizes the importance of improving the generalization ability and adaptability of the models.

BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autonomous Driving

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

RSBEV: Multi-view Collaborative Segmentation of 3D Remote Sensing Scenes with Bird’s-Eye-View Representation

DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

CMGFA: A BEV Segmentation Model Based on Cross-Modal Group-Mix Attention Feature Aggregator

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation

A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View

Delving into the Secrets of BEV 3D Object Detection in Autonomous Driving: A Comprehensive Survey

BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

Improving Bird’s Eye View Semantic Segmentation by Task Decomposition

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping