Abstract:Scene understanding is a significant research topic in computer vision, especially for robots to understand their environment intelligently. Semantic scene segmentation can help robots to identify the objects that are present in their surroundings, while semantic scene completion can enhance the ability of the robot to infer the object shape, which is pivotal for several high-level tasks. With dense Conditional Random Field (CRF), one key issue is how to construct the long-range interactions between nodes with Gaussian pairwise potentials. Another issue is what effective and efficient inference algorithms can be adapted to resolve the optimization. In this paper, we focus on semantic scene segmentation and completion optimization technology simultaneously using dense CRF based on a single depth image only. Firstly, we convert the single depth image into different down-sampled Truncated Signed Distance Function (TSDF) or flipped TSDF voxel formats, and formulate the pairwise potentials terms with such a representation. Secondly, we use the output results of an end-to-end 3D convolutional neural network named SSCNet to obtain the unary potentials. Finally, we pursue the efficiency of different CRF inference algorithms (the mean-field inference, the negative semi-definite specific difference of convex relaxation, the proximal minimization of linear programming and its variants, etc.). The proposed dense CRF and inference algorithms are evaluated on three different datasets (SUNCG, NYU, and NYUCAD). Experimental results demonstrate that the voxel-level intersection over union (IoU) of predicted voxel’s semantic and completion can reach to state-of-the-art. Specifically, for voxel semantic segmentation, the highest IoU improvements are 2.6%, 1.3%, 3.1%, and for scene completion, the highest IoU improvements are 2.5%, 3.7%, 5.4%, respectively for SUNCG, NYU, and NYUCAD datasets.

Indoor Scene Recognition by Fusing Map-Level and Frame-Level Decisions with CRF

Overall Understanding of Indoor Scenes by Fusing Multiframe Local RGB-D Data Based on Conditional Random Fields

Object-aware Semantic Mapping of Indoor Scenes Using Octomap

Real-time scene category of indoor robot based on semantic mapping

Multi-class Indoor Semantic Segmentation with Deep Structured Model

Indoor Scene Recognition in 3D

Multi-View Fusion-Based 3D Object Detection for Robot Indoor Scene Perception

Outdoor scene understanding of mobile robot via multi-sensor information fusion

Regional Semantic Learning and Mapping Based on Convolutional Neural Network and Conditional Random Field

Indoor Instance-Aware Semantic Mapping Using Instance Segmentation

Indoor Semantic Scene Understanding using Multi-modality Fusion

What can i do around here? Deep functional scene understanding for cognitive robots

Urban Scene Segmentation with Laser-Constrained CRFs

Indoor Scene Recognition via Object Detection and TF-IDF

Indoor 3D Semantic Robot VSLAM Based on Mask Regional Convolutional Neural Network

An Indoor Scene Classification Method for Service Robot Based on CNN Feature

SRRM: Semantic Region Relation Model for Indoor Scene Recognition

3D Semantic Segmentation Algorithm for Indoor Scenes based on Long-term Memory

An Indoor Scene Recognition Method Combining Global and Saliency Region Features

Semantic Scene Completion with Dense CRF from a Single Depth Image

Recognition of Indoor Scenes Using 3-D Scene Graphs