DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning

Senthil Yogamani,David Unger,Venkatraman Narayanan,Varun Ravi Kumar
2024-04-09
Abstract:Semantic segmentation is an effective way to perform scene understanding. Recently, segmentation in 3D Bird's Eye View (BEV) space has become popular as its directly used by drive policy. However, there is limited work on BEV segmentation for surround-view fisheye cameras, commonly used in commercial vehicles. As this task has no real-world public dataset and existing synthetic datasets do not handle amodal regions due to occlusion, we create a synthetic dataset using the Cognata simulator comprising diverse road types, weather, and lighting conditions. We generalize the BEV segmentation to work with any camera model; this is useful for mixing diverse cameras. We implement a baseline by applying cylindrical rectification on the fisheye images and using a standard LSS-based BEV segmentation model. We demonstrate that we can achieve better performance without undistortion, which has the adverse effects of increased runtime due to pre-processing, reduced field-of-view, and resampling artifacts. Further, we introduce a distortion-aware learnable BEV pooling strategy that is more effective for the fisheye cameras. We extend the model with an occlusion reasoning module, which is critical for estimating in BEV space. Qualitative performance of DaF-BEVSeg is showcased in the video at
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The main problem this paper attempts to address is the inadequacy of existing Bird's Eye View (BEV) semantic segmentation methods when dealing with fisheye camera images. Specifically: 1. **Lack of real-world datasets**: Currently, there is no publicly available real-world dataset for BEV semantic segmentation using fisheye cameras. Existing synthetic datasets do not adequately handle non-modal regions caused by occlusions. 2. **Special challenges of fisheye cameras**: Fisheye cameras introduce more image distortion due to their larger field of view (FOV), making it difficult to directly apply traditional pinhole camera models to fisheye cameras. Therefore, specialized methods are needed to address the distortion issues of fisheye cameras. 3. **Occlusion reasoning**: Proper handling of occluded areas in BEV space is crucial for scene understanding, especially in urban driving and parking scenarios where occlusions are very common. To address these issues, the paper proposes the following main contributions: - **Creation of a fisheye BEV segmentation dataset**: A synthetic dataset is generated using a commercial-grade simulator, containing various road types, weather, and lighting conditions, and providing occlusion masks. - **Design of a new distortion-aware learnable pooling strategy**: This strategy adapts using camera intrinsics to effectively handle the distortion issues of fisheye cameras. - **Proposal of a general framework**: This framework generates BEV semantic segmentation from raw images and supports various camera models. - **Development of an end-to-end multi-task model**: This model not only provides semantic categories but also performs occlusion reasoning in ambiguous scenes. Through these methods, the paper aims to improve the performance and robustness of fisheye cameras in BEV semantic segmentation tasks.