Instance-Aware Monocular 3D Semantic Scene Completion
Haihong Xiao,Hongbin Xu,Wenxiong Kang,Yuqiong Li
DOI: https://doi.org/10.1109/tits.2023.3344806
IF: 8.5
2024-01-01
IEEE Transactions on Intelligent Transportation Systems
Abstract:We study outdoor 3D scene understanding, a challenging task demanding the intelligent system to infer both geometry and semantics from a single-view image – a critical skill for autonomous vehicles to navigate in the real 3D world. Towards this end, we present an instance-aware monocular semantic scene completion framework. To the best of our knowledge, this is the first endeavor specifically targeting the challenge of instance perception in the camera-based semantic scene completion task. Our method consists of two stages. In stage I, we design a region-based VQ-VAE network, providing an effective solution for 3D occupancy prediction. In stage II, we first introduce an instance-aware attention module, explicitly incorporating instance-level cues captured from mask images to enhance the instance features in RGB images. Then we leverage the deformable cross-attention to aggregate image features corresponding to each voxel query and utilize the deformable self-attention to refine query proposals. We combine these key ingredients and evaluate our method on two challenging datasets, namely SemanticKITTI and SSCBench-KITTI-360. The results unequivocally demonstrate the superiority of our proposed method over the state-of-the-art VoxFormer-S. Specifically, our method surpasses VoxFormer-S by 0.22 IoU and 0.72 mIoU on the validation set and achieves an impressive improvement of 3.04 IoU and 1.06 mIoU on the SSCBench-KITTI-360 validation set. Meanwhile, our approach ensures accurate perception of critical instances, thereby exhibiting its exceptional performance and potential for practical deployment.
engineering, electrical & electronic,transportation science & technology, civil