Stimulus-driven and Concept-Driven Analysis for Image Caption Generation

Songtao Ding,Shiru Qu,Yuling Xi,Shaohua Wan
DOI: https://doi.org/10.1016/j.neucom.2019.04.095
IF: 6
2020-01-01
Neurocomputing
Abstract:Recently, image captioning has achieved great progress in computer vision and artificial intelligence. However, language models still failed to achieve the desired results in high-level visual tasks. Generating accurate image captions for a complex scene that contains multiple targets is a challenge. To solve these problems, we introduce the theory of attention in psychology to image caption generation. We propose two types of attention mechanisms: The stimulus-driven and the concept-driven. Our attention model relies on a combination of convolutional neural network (CNN) over images and long-short term memory (LSTM) network over sentences. Comparison of experimental results illustrates that our proposed method achieves good performance on the MSCOCO test server.
What problem does this paper attempt to address?