Structured Bird’s-Eye View Road Scene Understanding from Surround Video

Peng Jia,Jianwei Gong,Yahui Jiang,Yuchun Wang,Yubo Zhang,Zhiyang Ju
DOI: https://doi.org/10.1109/iv55156.2024.10588512
2024-01-01
Abstract:Autonomous vehicles require an accurate understanding of the surrounding road scene for navigation. One crucial task in this understanding is the bird’s-eye view (BEV) road network estimation. However, accurately extracting the BEV road network around the vehicle in complex scenes, considering variations in lane curvature and shape, remains a challenge. This paper aims to accurately represent and learn the BEV road network around the vehicle for structured road scene understanding. Specifically, we propose a road network representation, i.e., representing the lane centerline as an ordered point set and the road network as a directed graph, which accurately describes lane centerline instances and lane topological relationships in complex scenes. Then, we introduce an online road network estimation framework that takes onboard surround-view video as input and utilizes hierarchical query embedding to extract the BEV road network around the vehicle. Furthermore, we present a temporal aggregation module to alleviate occlusion issues in road scenes and enhance the accuracy of road network estimation by incorporating historical frame information flexibly. Finally, we conducted extensive experiments on the nuScenes dataset to validate the effectiveness of the proposed method in structured BEV road scene understanding.
What problem does this paper attempt to address?