HFT: Lifting Perspective Representations Via Hybrid Feature Transformation for BEV Perception.

Jiayu Zou,Zheng Zhu,Junjie Huang,Tian Yang,Guan Huang,Xingang Wang
DOI: https://doi.org/10.1109/icra48891.2023.10161214
2023-01-01
Abstract:Restoring an accurate Bird's Eye View (BEV) map plays a crucial role in the perception of autonomous driving. The existing works of lifting representations from frontal view to BEV can be classified into two categories, i.e., Camera model-Based Feature Transformation (CBFT) and Camera model-Free Feature Transformation (CFFT). We empirically analyze the significant differences between CBFT and CFFT. The former method lift perspective features based on the flat-world assumption, which often causes distortion of regions lying above the ground plane. The latter method is limited in the perception performance due to the absence of geometric priors and time-consuming computing. In this paper, we propose a novel framework with a Hybrid Feature Transformation module (HFT) to lift perspective representations. Furthermore, we design a mutual learning scheme to augment hybrid transformation. The deformable attention mechanism enables the model to pay more attention to relevant regions and capture features with more semantics. We illustrate the effectiveness of HFT in BEV perception tasks, such as segmentation and object detection. Notably, in the task of semantic segmentation, extensive experiments demonstrate that HFT outperforms the previous state-of-the-art method by relatively 17.9% on the Argoverse and 22.0% on the KITTI 3D Object dataset. With negligible computing budget, HFT outperforms existing image-based methods on 3D object detection. The code will be released soon.
What problem does this paper attempt to address?