Scene understanding using natural language description based on 3D semantic graph map

Jiyoun Moon,Beomhee Lee
DOI: https://doi.org/10.1007/s11370-018-0257-x
2018-08-17
Intelligent Service Robotics
Abstract:A natural language description for working environment understanding is an important component in human–robot communication. Although 3D semantic graph mappings are widely studied for perceptual aspects of the environment, these approaches hardly apply to the communication issues such as natural language descriptions for a semantic graph map. There are many researches on workspace understanding over images in the field of computer vision, which automatically generate sentences while they usually never utilize multiple scenes and 3D information. In this paper, we introduce a novel natural language description method using 3D semantic graph map. An object-oriented semantic graph map is first constructed using 3D information. A graph convolutional neural network and a recurrent neural network are then used to generate a description of the map. A natural language sentence focusing on objects over 3D semantic graph map can be eventually generated consisting of a single scene or multiple scenes. We validate the proposed method using publicly available dataset and compare it with conventional methods.
robotics
What problem does this paper attempt to address?