Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Zhu Junyu,Liu Lina,Tang Yu,Wen Feng,Li Wanlong,Liu Yong
DOI: https://doi.org/10.1109/icra57147.2024.10611420
2024-01-01
Abstract:Visual bird’s eye view (BEV) semantic segmentation helps autonomous vehicles understand the surrounding environment only from front-view (FV) images, including static elements (e.g., roads) and dynamic elements (e.g., vehicles, pedestrians). However, the high cost of annotation procedures of full-supervised methods limits the capability of the visual BEV semantic segmentation, which usually needs HD maps, 3D object bounding boxes, and camera extrinsic matrixes. In this paper, we present a novel semi-supervised framework for visual BEV semantic segmentation to boost performance by exploiting unlabeled images during the training. A consistency loss that makes full use of unlabeled data is then proposed to constrain the model on not only semantic prediction but also the BEV feature. Furthermore, we propose a novel and effective data augmentation method named conjoint rotation which reasonably augments the dataset while maintaining the geometric relationship between the FV images and the BEV semantic segmentation. Extensive experiments on the nuScenes dataset show that our semi-supervised framework can effectively improve prediction accuracy. To the best of our knowledge, this is the first work that explores improving visual BEV semantic segmentation performance using unlabeled data. The code is available at https://github.com/Junyu-Z/Semi-BEVseg.
What problem does this paper attempt to address?