Abstract:Scene understanding is a significant research topic in computer vision, especially for robots to understand their environment intelligently. Semantic scene segmentation can help robots to identify the objects that are present in their surroundings, while semantic scene completion can enhance the ability of the robot to infer the object shape, which is pivotal for several high-level tasks. With dense Conditional Random Field (CRF), one key issue is how to construct the long-range interactions between nodes with Gaussian pairwise potentials. Another issue is what effective and efficient inference algorithms can be adapted to resolve the optimization. In this paper, we focus on semantic scene segmentation and completion optimization technology simultaneously using dense CRF based on a single depth image only. Firstly, we convert the single depth image into different down-sampled Truncated Signed Distance Function (TSDF) or flipped TSDF voxel formats, and formulate the pairwise potentials terms with such a representation. Secondly, we use the output results of an end-to-end 3D convolutional neural network named SSCNet to obtain the unary potentials. Finally, we pursue the efficiency of different CRF inference algorithms (the mean-field inference, the negative semi-definite specific difference of convex relaxation, the proximal minimization of linear programming and its variants, etc.). The proposed dense CRF and inference algorithms are evaluated on three different datasets (SUNCG, NYU, and NYUCAD). Experimental results demonstrate that the voxel-level intersection over union (IoU) of predicted voxel’s semantic and completion can reach to state-of-the-art. Specifically, for voxel semantic segmentation, the highest IoU improvements are 2.6%, 1.3%, 3.1%, and for scene completion, the highest IoU improvements are 2.5%, 3.7%, 5.4%, respectively for SUNCG, NYU, and NYUCAD datasets.

Semantic scene completion with point cloud representation and transformer-based feature fusion

Attention-based Multi-modal Fusion Network for Semantic Scene Completion.

Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion

Dual-scale Point Cloud Completion Network Based on High-Frequency Feature Fusion

Object-Aware Semantic Scene Completion Through Attention-Based Feature Fusion and Voxel-Points Representation

2D Semantic-Guided Semantic Scene Completion

Semantic Scene Completion with Cleaner Self

Semantic Scene Completion Through Multi-Level Feature Fusion

CasFusionNet: A Cascaded Network for Point Cloud Semantic Scene Completion by Dense Feature Fusion

Semantic Point Completion Network for 3D Semantic Scene Completion.

Not All Voxels Are Equal: Semantic Scene Completion from the Point-Voxel Perspective

Context and Geometry Aware Voxel Transformer for Semantic Scene Completion

Semantic Scene Completion Through Context Transformer and Recurrent Convolution

PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion

Point Cloud Semantic Scene Completion from RGB-D Images.

Semantic Scene Completion with Dense CRF from a Single Depth Image

Real-time 3D Semantic Scene Completion Via Feature Aggregation and Conditioned Prediction

Real-Time Semantic Scene Completion Via Feature Aggregation And Conditioned Prediction

CASSC: Context‐aware Method for Depth Guided Semantic Scene Completion

Voxel- and Bird's-Eye-View-Based Semantic Scene Completion for LiDAR Point Clouds

Resolution-switchable 3D Semantic Scene Completion