Abstract:Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. It realizes ego-vehicle surrounding environment perception by projecting 2D multi-view images into 3D world space. Recently, BEV segmentation has made notable progress, attributed to better view transformation modules, larger image encoders, or more temporal information. However, there are still two issues: 1) a lack of effective understanding and enhancement of BEV space features, particularly in accurately capturing long-distance environmental features and 2) recognizing fine details of target objects. To address these issues, we propose OE-BevSeg, an end-to-end multimodal framework that enhances BEV segmentation performance through global environment-aware perception and local target object enhancement. OE-BevSeg employs an environment-aware BEV compressor. Based on prior knowledge about the main composition of the BEV surrounding environment varying with the increase of distance intervals, long-sequence global modeling is utilized to improve the model's understanding and perception of the environment. From the perspective of enriching target object information in segmentation results, we introduce the center-informed object enhancement module, using centerness information to supervise and guide the segmentation head, thereby enhancing segmentation performance from a local enhancement perspective. Additionally, we designed a multimodal fusion branch that integrates multi-view RGB image features with radar/LiDAR features, achieving significant performance improvements. Extensive experiments show that, whether in camera-only or multimodal fusion BEV segmentation tasks, our approach achieves state-of-the-art results by a large margin on the nuScenes dataset for vehicle segmentation, demonstrating superior applicability in the field of autonomous driving.

BEV Perception for Autonomous Driving: State of the Art and Future Perspectives

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Vision-Centric BEV Perception: A Survey

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

Vision-Driven 2D Supervised Fine-Tuning Framework for Bird's Eye View Perception

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

Bird’s Eye View Map for End-to-end Autonomous Driving Using Reinforcement Learning

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving

OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

Hierarchical End-to-End Autonomous Driving: Integrating BEV Perception with Deep Reinforcement Learning

BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping

M-BEV: Masked BEV Perception for Robust Autonomous Driving

BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autonomous Driving

Surround-View Vision-based 3D Detection for Autonomous Driving: A Survey