Weakly-supervised Image Captioning Based on Rich Contextual Information

Hai-Tao Zheng,Zhe Wang,Ningning Ma,Jinyuan Chen,Xi Xiao,Arun Kumar Sangaiah
DOI: https://doi.org/10.1007/s11042-017-5236-2
IF: 2.577
2017-01-01
Multimedia Tools and Applications
Abstract:Automatically generation of an image description is a challenging task which attracts broad attention in artificial intelligence. Inspired by methods of computer vision and natural language processing, different approaches have been proposed to solve the problem. However, captions generated by the existing approaches have been lack of enough contextual information to describe the corresponding images completely. The labeled captions in the training set only basically describe images and lack of enough contextual annotations. In this paper, we propose a Weakly-supervised Image Captioning Approach (WICA) to generate captions containing rich contextual information, without complete annotations for the contextual information in datasets. We utilize encoder-decoder neural networks to extract basic captioning features and leverage object detection networks to identify contextual features. Then, we encode the two levels of features by a phrase-based language model in order to generate captions with rich contextual information. The comprehensive experimental results reveal that proposed model outperforms the existing baselines in terms of on the richness and reasonability of contextual information for image captioning.
What problem does this paper attempt to address?