Improving Bird’s Eye View Semantic Segmentation by Task Decomposition

Tianhao Zhao,Yongcan Chen,Yu Wu,Tianyang Liu,Bo Du,Peilun Xiao,shi qiu,Hongda Yang,Guozhen Li,yi yang,Yutian Lin
DOI: https://doi.org/10.1109/cvpr52733.2024.01469
2024-01-01
Abstract:Semantic segmentation in bird's eye view (BEV) plays a crucial role inautonomous driving. Previous methods usually follow an end-to-end pipeline,directly predicting the BEV segmentation map from monocular RGB inputs.However, the challenge arises when the RGB inputs and BEV targets from distinctperspectives, making the direct point-to-point predicting hard to optimize. Inthis paper, we decompose the original BEV segmentation task into two stages,namely BEV map reconstruction and RGB-BEV feature alignment. In the firststage, we train a BEV autoencoder to reconstruct the BEV segmentation mapsgiven corrupted noisy latent representation, which urges the decoder to learnfundamental knowledge of typical BEV patterns. The second stage involvesmapping RGB input images into the BEV latent space of the first stage, directlyoptimizing the correlations between the two views at the feature level. Ourapproach simplifies the complexity of combining perception and generation intodistinct steps, equipping the model to handle intricate and challenging sceneseffectively. Besides, we propose to transform the BEV segmentation map from theCartesian to the polar coordinate system to establish the column-wisecorrespondence between RGB images and BEV maps. Moreover, our method requiresneither multi-scale features nor camera intrinsic parameters for depthestimation and saves computational overhead. Extensive experiments on nuScenesand Argoverse show the effectiveness and efficiency of our method. Code isavailable at https://github.com/happytianhao/TaDe.
What problem does this paper attempt to address?