Combining Object-Based Attention And Attributes For Image Captioning

Cong Li,Jiansheng Chen,Weitao Wan,Tianpeng Li
DOI: https://doi.org/10.1007/978-3-319-71607-7_54
2017-01-01
Abstract:Image captioning has been a hot topic in computer vision and natural language processing. Recently, researchers have proposed many models for image captioning which can be classified into two classes: visual attention based models and semantic attributes based models. In this paper, we propose a novel image captioning system which models the relationship between semantic attributes and visual attention. Besides, different from the traditional attention models which don't use object detectors and instead learn latent alignment between regions and words, we propose an object attention system which is capable to incorporate information output by object detectors and can better attend to objects when generating corresponding words. We evaluate our method on MS COCO dataset and our model outperforms many strong baselines.
What problem does this paper attempt to address?