Abstract:Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to process the lifted 3D features directly, which are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose a one-stage camera-based SSC framework, termed SGN, to propagate semantics from the semantic-aware seed voxels to the whole scene based on spatial geometry cues. Firstly, to exploit depth-aware context and dynamically select sparse seed voxels, we redesign the sparse voxel proposal network to process points generated by depth prediction directly with the coarse-to-fine paradigm. Furthermore, by designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial geometry cues, we enhance the feature separation between different categories and expedite the convergence of semantic propagation. Finally, we devise the multi-scale semantic propagation module for flexible receptive fields while reducing the computation resources. Extensive experimental results on the SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate the superiority of our SGN over existing state-of-the-art methods. And even our lightweight version SGN-L achieves notable scores of 14.80\% mIoU and 45.45\% IoU on SeamnticKITTI validation with only 12.5 M parameters and 7.16 G training memory. Code is available at <a class="link-external link-https" href="https://github.com/Jieqianyu/SGN" rel="external noopener nofollow">this https URL</a>.

From Front to Rear: 3D Semantic Scene Completion through Planar Convolution and Attention-based Network

Attention-based Multi-modal Fusion Network for Semantic Scene Completion.

Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion

Omnisupervised Omnidirectional Semantic Segmentation

Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion

Semantic Scene Completion with Cleaner Self

2D Semantic-Guided Semantic Scene Completion

Anisotropic Convolutional Neural Networks for RGB-D based Semantic Scene Completion

3D Sketch-aware Semantic Scene Completion Via Semi-supervised Structure Prior

Camera-based 3D Semantic Scene Completion with Sparse Guidance Network

SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net

Instance-Aware Monocular 3D Semantic Scene Completion

DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion

Semantic Point Completion Network for 3D Semantic Scene Completion.

Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion

SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion

AEFF-SSC: An Attention-Enhanced Feature Fusion for 3D Semantic Scene Completion

Towards 3D Semantic Scene Completion for Autonomous Driving: A Meta-Learning Framework Empowered by Deformable Large-Kernel Attention and Mamba Model

PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving