Attend to Knowledge: Memory-Enhanced Attention Network for Image Captioning.

Hui Chen,Guiguang Ding,Zijia Lin,Yuchen Guo,Jungong Han
DOI: https://doi.org/10.1007/978-3-030-00563-4_16
2018-01-01
Abstract:Image captioning, which aims to automatically generate sentences for images, has been exploited in many works. The attention-based methods have achieved impressive performance due to its superior ability of adapting the image’s feature to the context dynamically. Since the recurrent neural network has difficulties in remembering the information too far in the past, we argue that the attention model may not be adequately supervised by the guidance from the previous information at a distance. In this paper, we propose a memory-enhanced attention model for image captioning, aiming to improve the attention mechanism with previous learned knowledge. Specifically, we store the visual and semantic knowledge which has been exploited in the past into memories, and generate a global visual or semantic feature to improve the attention model. We verify the effectiveness of the proposed model on two prevalent benchmark datasets MS COCO and Flickr30k. The comparison with the state-of-the-art models demonstrates the superiority of the proposed model.
What problem does this paper attempt to address?