Predicting Bird's-Eye-View Semantic Representations Using Correlated Context Learning

Yongquan Chen,Weiming Fan,Wenli Zheng,Rui Huang,Jiahui Yu
DOI: https://doi.org/10.1109/lra.2024.3384078
IF: 5.2
2024-01-01
IEEE Robotics and Automation Letters
Abstract:We redefine the concept of bird's-eye-view (BEV) imaging for machine cognition tasks, emphasizing its power as an image interpretation tool. Humans intuitively translate two-dimensional (2D) images into BEV representations by discerning and integrating spatial information, such as position and morphological aspects. Existing techniques focus primarily on improving accuracy in whole-to-whole mapping. However, this often results in a loss of globallocal correlation, posing a significant challenge in predicting complex elements, such as multiscale dynamic objects and small-scale static objects in the distance. To address this issue, we propose correlated global'local spatial context learning (CGLSCL), one of the first attempts to amalgamate positional and morphological cues in translation for machine cognition tasks. Augmented by correlated learning, CGLSCL ensures more comprehensive BEV output, particularly for minor and fast-moving elements, which need to be captured more effectively than they are by existing methods. An evaluation of CGLSCL using the NuScenes and Argoverse 3D datasets demonstrated its superior performance compared to current state-of-the-art methods, particularly in predicting complex elements.
What problem does this paper attempt to address?