A Novel Semantic Attribute-Based Feature for Image Caption Generation.

Wei Wang,Yuxuan Ding,Chunna Tian
DOI: https://doi.org/10.1109/icassp.2018.8461507
2018-01-01
Abstract:Image captioning is challenging because it connects computer vision and natural language processing. It requires not only sensing objects but also the interrelations and context in an image to generate natural language descriptions. In this paper, we propose to extract a novel visual feature weighted by salient semantic attributes, which is fed to the encoder of Long Short Term Memory (LSTM). Semantic attributes are important to exploit more semantic-related information in images and describe the salient scenes to enhance the accuracy of generating image captions. Based on the Multiple Instance Learning (MIL) architecture on VGG-16 network, we design transferring rules that map high probability attributes to the feature vector in fc7 layer. It results in more semantic-related visual features. Our model can recognize richer details of images effectively and achieve the state-of-the-art performance on MSCOCO 2014 dataset under standard metrics.
What problem does this paper attempt to address?