Relevance and Coherence Based Image Caption.

Tao Zhang,Wei Wang,Liang Wang,Qinghua Hu
DOI: https://doi.org/10.1007/978-981-10-7299-4_21
2017-01-01
Abstract:The attention-based image caption framework has been widely explored in recent years. However, most techniques generate next word conditioned on previous words and current visual contents, while the relationship between the semantic and visual contents is not considered. In this paper, we present a novel framework which can explore the relevance and coherence at the same time. The relevance tries to explore the relationship between the semantic and visual contents in a semanticvisual embedding space, and the coherence is introduced to maximize the probability of generating the next word according to previous words and the current visual contents. The performance of our model is tested with three benchmark datasets: Flickr8k, Flickr30k and MS COCO. The experimental results show that the proposed approach can improve the performance of attention-based image caption method.
What problem does this paper attempt to address?