Geometry-semantic Aware for Monocular 3D Semantic Scene Completion

Zonghao Lu,Bing Cao,Shuyin Xia,Qinghua Hu
DOI: https://doi.org/10.1016/j.patcog.2024.111030
2025-01-01
Abstract:Monocular Semantic Scene Completion (SSC) empowers intelligent devices to comprehend voxel occupancy (geometry) and semantics in 3D scenes, attracting significant attention in indoor and autonomous driving scenarios. However, existing monocular SSC models primarily map 2D images into 3D space, neglecting the potential benefits of leveraging semantic and geometric understanding in 2D. To address this, we propose the Proxy-embedding Parallel Multi-task Network (PPMNet), which aims to perceive the geometry and semantics of 3D space through depth estimation and semantic segmentation proxy tasks on the 2D perspective plane. Moreover, 2D plane features can be inversely projected into 3D space and subsequently processed using the 3D network. In addition, we enhance contextual awareness in both perspective planes and voxel grids through parallel 2D and 3D decoders. Furthermore, we employ Dual-Head Pyramid Pooling (DHPP) to aggregate information from these two representations. Finally, considering the class imbalance and label incompleteness in practical data, we design a local-to-global loss to prioritize challenging categories. Extensive experiments validate our superiority over state-of-the-art methods on the NYUv2 and SemanticKITTI datasets. The code is available at: https://github.com/luzonghao1/PPMNet.
What problem does this paper attempt to address?