BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autonomous Driving

Manuel Alejandro Diaz-Zapata,Wenqian Liu,Robin Baruffa,Christian Laugier
2024-09-12
Abstract:Current research in semantic bird's-eye view segmentation for autonomous driving focuses solely on optimizing neural network models using a single dataset, typically nuScenes. This practice leads to the development of highly specialized models that may fail when faced with different environments or sensor setups, a problem known as domain shift. In this paper, we conduct a comprehensive cross-dataset evaluation of state-of-the-art BEV segmentation models to assess their performance across different training and testing datasets and setups, as well as different semantic categories. We investigate the influence of different sensors, such as cameras and LiDAR, on the models' ability to generalize to diverse conditions and scenarios. Additionally, we conduct multi-dataset training experiments that improve models' BEV segmentation performance compared to single-dataset training. Our work addresses the gap in evaluating BEV segmentation models under cross-dataset validation. And our findings underscore the importance of enhancing model generalizability and adaptability to ensure more robust and reliable BEV segmentation approaches for autonomous driving applications. The code for this paper available at <a class="link-external link-https" href="https://github.com/manueldiaz96/beval" rel="external noopener nofollow">this https URL</a> .
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that current Bird - Eye - View (BEV) semantic segmentation models in the field of autonomous driving rely only on a single dataset for optimization and evaluation, which may cause these models to fail when facing different environments or sensor settings, that is, there is a domain shift problem. Specifically: 1. **Domain shift problem**: Most existing BEV segmentation models are trained and tested using the nuScenes single dataset, which makes the models perform poorly when encountering data with different environments or sensor configurations. 2. **Insufficient generalization ability of models**: Due to the lack of cross - dataset verification, the generalization ability of these models has not been fully evaluated, thus affecting their reliability and robustness in practical applications. To solve these problems, the author proposes a cross - dataset evaluation framework, aiming to evaluate the performance of BEV segmentation models on different training and testing datasets, different sensor configurations, and different semantic categories. In this way, researchers can more comprehensively understand the generalization ability and adaptability of the models and ensure that the models are more reliable in diverse real - world scenarios. ### Main contributions 1. **Propose a cross - dataset verification framework for the first time**: This framework can be extended to more models, datasets, and semantic categories, providing a flexible and general evaluation method. 2. **Multi - dataset experiments**: Three state - of - the - art BEV segmentation models are evaluated using two large - scale real - world datasets (nuScenes and Woven Planet), covering different input sensor modalities and three semantic segmentation categories. 3. **Multi - dataset training**: The effect of training models on multiple datasets simultaneously is studied to improve the generalization ability of the models. ### Method overview - **Dataset selection**: The nuScenes and Woven Planet datasets are selected because they have similar sensor configurations and are highly representative in the BEV segmentation field. - **Point cloud processing**: To unify the point cloud density differences between the two datasets, the Woven Planet dataset is down - sampled. - **Image processing**: The size and pre - processing methods of the images in the two datasets are adjusted to ensure consistency. - **Model selection**: Three different types of BEV segmentation models are selected: the LSS model that uses only cameras, the LAPT model that fuses cameras and LiDAR early, and the LAPT - PP model that fuses cameras and LiDAR late. - **Experimental design**: Models are trained on single datasets and multiple datasets respectively, and tested on different datasets to evaluate the cross - dataset generalization ability of the models. Through these methods, the author reveals the performance differences of existing models on different datasets and emphasizes the importance of improving the generalization ability and adaptability of the models.