Semantic Scene Completion Through Context Transformer and Recurrent Convolution

Wenlong Yang,Hongfei Yu,Yang Cao
DOI: https://doi.org/10.1109/access.2024.3401481
IF: 3.9
2024-05-24
IEEE Access
Abstract:The purpose of monocular semantic scene completion is to predict detailed 3D scene with semantic information using only one image. In order to improve the ability of extracting image features of the classical network and achieve better semantic scene completion effect, we propose a monocular semantic scene completion method based on context transformer and recurrent residual convolution. The context transformer module was added between the encoder and decoder of the image feature extraction network, which uses context information to guide the learning of the dynamic attention matrix and improve the visual representation ability. We also introduce a recurrent residual convolution module into the decoder to accumulate features at different time steps, thus helping to distinguish similar objects. Experimental results show that, on indoor dataset NYUv2 and outdoor traffic scene dataset Semantic KITTI, compared with the baseline method, the evaluation metrics mIoU of the semantic scene completion task is improved by 5% and 8% respectively.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?