Abstract:Semantic segmentation is an effective way to perform scene understanding. Recently, segmentation in 3D Bird's Eye View (BEV) space has become popular as its directly used by drive policy. However, there is limited work on BEV segmentation for surround-view fisheye cameras, commonly used in commercial vehicles. As this task has no real-world public dataset and existing synthetic datasets do not handle amodal regions due to occlusion, we create a synthetic dataset using the Cognata simulator comprising diverse road types, weather, and lighting conditions. We generalize the BEV segmentation to work with any camera model; this is useful for mixing diverse cameras. We implement a baseline by applying cylindrical rectification on the fisheye images and using a standard LSS-based BEV segmentation model. We demonstrate that we can achieve better performance without undistortion, which has the adverse effects of increased runtime due to pre-processing, reduced field-of-view, and resampling artifacts. Further, we introduce a distortion-aware learnable BEV pooling strategy that is more effective for the fisheye cameras. We extend the model with an occlusion reasoning module, which is critical for estimating in BEV space. Qualitative performance of DaF-BEVSeg is showcased in the video at

What problem does this paper attempt to address?

The main problem this paper attempts to address is the inadequacy of existing Bird's Eye View (BEV) semantic segmentation methods when dealing with fisheye camera images. Specifically: 1. **Lack of real-world datasets**: Currently, there is no publicly available real-world dataset for BEV semantic segmentation using fisheye cameras. Existing synthetic datasets do not adequately handle non-modal regions caused by occlusions. 2. **Special challenges of fisheye cameras**: Fisheye cameras introduce more image distortion due to their larger field of view (FOV), making it difficult to directly apply traditional pinhole camera models to fisheye cameras. Therefore, specialized methods are needed to address the distortion issues of fisheye cameras. 3. **Occlusion reasoning**: Proper handling of occluded areas in BEV space is crucial for scene understanding, especially in urban driving and parking scenarios where occlusions are very common. To address these issues, the paper proposes the following main contributions: - **Creation of a fisheye BEV segmentation dataset**: A synthetic dataset is generated using a commercial-grade simulator, containing various road types, weather, and lighting conditions, and providing occlusion masks. - **Design of a new distortion-aware learnable pooling strategy**: This strategy adapts using camera intrinsics to effectively handle the distortion issues of fisheye cameras. - **Proposal of a general framework**: This framework generates BEV semantic segmentation from raw images and supports various camera models. - **Development of an end-to-end multi-task model**: This model not only provides semantic categories but also performs occlusion reasoning in ambiguous scenes. Through these methods, the paper aims to improve the performance and robustness of fisheye cameras in BEV semantic segmentation tasks.

DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

F2BEV: Bird's Eye View Generation from Surround-View Fisheye Camera Images for Automated Driving

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras

Universal Semantic Segmentation for Fisheye Urban Driving Images

A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View

SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

RSBEV: Multi-view Collaborative Segmentation of 3D Remote Sensing Scenes with Bird’s-Eye-View Representation

Improving Bird’s Eye View Semantic Segmentation by Task Decomposition

Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

UIF-BEV: an Underlying Information Fusion Framework for Bird's-Eye-View Semantic Segmentation

CNN Based Semantic Segmentation for Urban Traffic Scenes Using Fisheye Camera

Surround-View Fisheye BEV-Perception for Valet Parking: Dataset, Baseline and Distortion-Insensitive Multi-Task Framework

BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation