Robotic Indoor Scene Captioning from Streaming Video

Xinghang Li,Di Guo,Huaping Liu,Fuchun Sun
DOI: https://doi.org/10.1109/icra48506.2021.9560904
2021-01-01
Abstract:Robots are usually equipped with cameras to explore the indoor scene and it is expected that the robot can well describe the scene with natural language. Although some great success has been achieved in image and video captioning technology, especially on many public datasets, the caption generated from indoor scene video is still not informative and coherent enough. In this paper, we propose the problem of Indoor Scene Captioning from Streaming Video, which aims at generating a more accurate and informative caption from streaming video. To solve this problem, we firstly design an algorithm to organize the visual information of the indoor scene into a scene graph, and then implement a scene graph guided captioning method, which takes the scene graph and video frames as input to generate the caption from the video streaming. The proposed framework is evaluated both on the AI2THOR dataset and a real-world robotic platform, demonstrating the effectiveness of the framework.
What problem does this paper attempt to address?