Efficient and Hybrid Decoder for Local Map Construction in Bird'-Eye-View.

Kun Tian,Yun Ye,Zheng Zhu,Peng Li,Guan Huang
DOI: https://doi.org/10.1109/icra48891.2023.10161331
2023-01-01
Abstract:High-definition maps are crucial perception elements for autonomous robot navigation systems, which can provide accurate scene layout and environment information for downstream motion prediction and planning control tasks. Traditional methods based on manual annotation or SLAM algorithms require massive labor efforts and time costs, which hinders the deployment of practical applications. Online construction of local maps from on-board cameras offers an alternative solution. Aiming at the problems of unsatisfying precision and redundant computation of HDMapNet, we propose an efficient and hybrid decoder (EHD) that consists of a CNN-based segmentation (Seg) head and a query-based lane detection head (QLD). Specifically, the Seg head outputs pixel-level semantic maps, and QLD predicts instance mask for each lane object through learnable query embeddings. The designed decoding method eliminates the cumulative error caused by inaccurate semantic maps and does not require additional clustering algorithm for post-processing. Through combining with a variety of bird's-eye-view (BEV) encoders, the effectiveness and efficiency of our EHD is demonstrated by extensive experiments. For segmentation task, the mIoU scores of semantic map can be improved by 1.3%∼2.9%. Additionally, the accuracy of lane detection is also significantly increased (more than 10.2% mAP) under all evaluation criteria. Since our method discards redundant post-processing, the inference speed is up to 22.71 FPS, which is 32 times faster than HDMapNet.
What problem does this paper attempt to address?