From Front to Rear: 3D Semantic Scene Completion through Planar Convolution and Attention-based Network

Jie Li,Qi Song,Xiaohu Yan,Yongquan Chen,Rui Huang
DOI: https://doi.org/10.1109/tmm.2023.3234441
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Semantic Scene Completion (SSC) aims to reconstruct complete 3D scenes with precise voxel-wise semantics from the single-view incomplete input data, a crucial but highly challenging problem for scene understanding. Although SSC has seen significant progress due to the introduction of 2D semantic priors in recent years, the occluded parts, especially the rear-view of the scenes, are still poorly completed and segmented. To ameliorate this issue, we propose a novel deep learning framework for 3D SSC, named Planar Convolution and Attention-based Network (PCANet), to effectively extend high-precision predictions of the front-view surface to the rear-view occluded areas. Specifically, we decompose the traditional convolutional layer into three successive planar convolutions to form a Planar Convolution Residual (PCR) block, which maintains the planar features of the 3D scene. Afterward, the Planar Attention Module (PAM) is proposed to capture three different planar attentions and harvest the global context from the front surface to the rear occluded areas to improve the overall accuracy. Extensive experiments on the real NYU and NYUCAD datasets and the synthetic SUNCG-RGBD dataset demonstrate that our proposed framework can generate high-quality SSC results in both front and rear views and outperforms the state-of-the-art approaches trained in an end-to-end manner without additional data.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?