Paragraph Generation Network with Visual Relationship Detection.

Wenbin Che,Xiaopeng Fan,Ruiqin Xiong,Debin Zhao
DOI: https://doi.org/10.1145/3240508.3240695
2018-01-01
Abstract:Paragraph generation of images is a new concept, aiming to produce multiple sentences to describe a given image. In this paper, we propose a paragraph generation network with introducing visual relationship detection. We first detect regions which may contain important visual objects and then predict their relationships. Paragraphs are produced based on object regions which have valid relationship with others. Compared with previous works which generate sentences based on region features, we explicitly explore and utilize visual relationships in order to improve final captions. The experimental results show that such strategy could improve paragraph generating performance from two aspects: more details about object relations are detected and more accurate sentences are obtained. Furthermore, our model is more robust to region detection fluctuation.
What problem does this paper attempt to address?