Abstract:Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. It realizes ego-vehicle surrounding environment perception by projecting 2D multi-view images into 3D world space. Recently, BEV segmentation has made notable progress, attributed to better view transformation modules, larger image encoders, or more temporal information. However, there are still two issues: 1) a lack of effective understanding and enhancement of BEV space features, particularly in accurately capturing long-distance environmental features and 2) recognizing fine details of target objects. To address these issues, we propose OE-BevSeg, an end-to-end multimodal framework that enhances BEV segmentation performance through global environment-aware perception and local target object enhancement. OE-BevSeg employs an environment-aware BEV compressor. Based on prior knowledge about the main composition of the BEV surrounding environment varying with the increase of distance intervals, long-sequence global modeling is utilized to improve the model's understanding and perception of the environment. From the perspective of enriching target object information in segmentation results, we introduce the center-informed object enhancement module, using centerness information to supervise and guide the segmentation head, thereby enhancing segmentation performance from a local enhancement perspective. Additionally, we designed a multimodal fusion branch that integrates multi-view RGB image features with radar/LiDAR features, achieving significant performance improvements. Extensive experiments show that, whether in camera-only or multimodal fusion BEV segmentation tasks, our approach achieves state-of-the-art results by a large margin on the nuScenes dataset for vehicle segmentation, demonstrating superior applicability in the field of autonomous driving.

BEVoxSeg: BEV-Voxel Representation for Fast and Accurate Camera-Based 3D Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Bird’s-Eye View Semantic Segmentation and Voxel Semantic Segmentation Based on Frustum Voxel Modeling and Monocular Camera

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud

M$^2$BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

Local-to-Global Perception Network for Point Cloud Segmentation

Efficient Urban-scale Point Clouds Segmentation with BEV Projection

GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with Bird's Eye View based Appearance and Motion Features

OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

An Efficient Encoding Voxel-Based Segmentation (EVBS) Algorithm Based on Fast Adjacent Voxel Search for Point Cloud Plane Segmentation

3D-BEVIS: Bird's-Eye-View Instance Segmentation

PointBeV: A Sparse Approach to BeV Predictions

BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization