Neural Image Caption Generation With Global Feature Based Attention Scheme

Yongzhuang Wang,Hongkai Xiong
DOI: https://doi.org/10.1007/978-3-319-71589-6_5
2017-01-01
Abstract:The attention scheme is believed to align the words with objects in the task of image caption. Considering the location of objects vary in the image, most attention scheme use the set of region features. Compared with global feature, the region features are lower level features. But we prefer high-level features in image caption generation because words are high-level concepts. So we explore a new attention scheme based on the global feature and it can be appended to the original image caption generation model directly. We show that our global feature based attention scheme (GFA) can achieve the same improvement as the traditional region feature based attention scheme. And our model can achieve aligning the words with different regions as the traditional attention scheme. We test our model in Flickr8k dataset and Flickr30k dataset.
What problem does this paper attempt to address?