Indoor Scene Recognition by Fusing Map-Level and Frame-Level Decisions with CRF

Shibo Gong,Yansong Gong,Longfei Su,Jing Yuan,Fengchi Sun
DOI: https://doi.org/10.1109/ccisp52774.2021.9639352
2021-01-01
Abstract:It’s a fundamental function for a service robot to recognize the type of its ambient scene. However, indoor service robots usually can’t get enough information of the scene with a single frame due to the limited field of view. Furthermore, lack of information may cause the robot to make incorrect judgments about the category of the current scene. To address this problem, we proposed a scene recognition model, based on conditional random field, which integrated the map-level and frame-level scene recognition results. By fusing the two different scene recognition results, the model could more fully utilize the information in the scene to obtain the final scene recognition result. The model first used a dense 3D semantic mapping model SemanticFusion to build a 3D semantic map based on the images collected by the robot, and then used the spatial relationship between objects existing in the semantic map to perform a map-level scene recognition. Meanwhile, key frames were used to get frame-level scene recognition results with neural network. Finally, conditional random field was adopted to fuse map-level and frame-level scene recognition results. Scene recognition based on 3D semantic map and scene recognition based on conditional random field were tested on Stanford 3D dataset and NYU V2 dataset respectively. The experimental results show that our scene recognition model which integrates map-level and frame-level decisions outperform both the map-level scene recognition model and the frame-level scene recognition model. In addition, the robot can use the 3D semantic map with scene category annotation generated in this paper to accomplish its task better.
What problem does this paper attempt to address?