Abstract:Bird's eye view (BEV) semantic maps have evolved into a crucial element of urban intelligent traffic management and monitoring, offering invaluable visual and significant data representations for informed intelligent city decision making. Nevertheless, current methodologies continue underutilizing the temporal information embedded within dynamic frames throughout the BEV feature transformation process. This limitation results in decreased accuracy when mapping high-speed moving objects, particularly in capturing their shape and dynamic trajectory. A framework is proposed for cross-view semantic segmentation to address this challenge, leveraging simulated environments as a starting point before applying it to real-life urban imaginative transportation scenarios. The view converter module is thoughtfully designed to collate information from multiple initial view observations captured from various angles and modes. This module outputs a top-down view semantic graph characterized by its object space layout to preserve beneficial temporal information in BEV transformation. The NuScenes dataset is used to evaluate model effectiveness. A novel application is also devised that harnesses transformer networks to map images and video sequences into top-down or comprehensive bird's-eye views. By combining physics-based and constraint-based formulations and conducting ablation studies, the approach has been substantiated, highlighting the significance of context above and below a given point in generating these maps. This innovative method has been thoroughly validated on the NuScenes dataset. Notably, it has yielded state-of-the-art instantaneous mapping results, with particular benefits observed for smaller dynamic category displays. The experimental findings include comparing axial attention with the state-of-the-art (SOTA) model, demonstrating the performance enhancement associated with temporal awareness.

L2T-BEV: Local Lane Topology Prediction from Onboard Surround-View Cameras in Bird's Eye View Perspective.

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior

Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors

Topology Preserving Local Road Network Estimation from Single Onboard Camera Image

An Efficient Transformer for Simultaneous Learning of BEV and Lane Representations in 3D Lane Detection

Prior Based Online Lane Graph Extraction from Single Onboard Camera Image

BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

Online Lane Graph Extraction from Onboard Video

Predicting Maps Using In-Vehicle Cameras for Data-Driven Intelligent Transport

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

Monocular BEV Perception of Road Scenes Via Front-to-Top View Projection

CenterLineDet: CenterLine Graph Detection for Road Lanes with Vehicle-mounted Sensors by Transformer for HD Map Generation

Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps

LaneDAG: Automatic HD Map Topology Generator Based on Geometry and Attention Fusion Mechanism

Reconstruct from BEV: A 3D Lane Detection Approach based on Geometry Structure Prior

Understanding Bird's-Eye View of Road Semantics using an Onboard Camera

Separated RoadTopoFormer

TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes

Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving