Hierarchical Deep Neural Network for Image Captioning

Su Yuting,Li Yuqian,Xu Ning,Liu An-An
DOI: https://doi.org/10.1007/s11063-019-09997-5
IF: 2.565
2019-01-01
Neural Processing Letters
Abstract:Automatically describing image content with natural language is a fundamental challenge for computer vision community. General methods used visual information to generate sentences directly. However, only depending on the visual information is not enough to generate the fine-grained descriptions for given images. In this paper, we exploit the fusion of visual information and high-level semantic information for image captioning. We propose a hierarchical deep neural network, which consists of the bottom layer and the top layer. The former extracts the visual and high-level semantic information from image and detected regions, respectively, while the latter integrates both of them with adaptive attention mechanism for the caption generation. The experimental results achieve the competing performances against the state-of-the-art methods on MSCOCO dataset.
What problem does this paper attempt to address?