Dual-CNN: A Convolutional Language Decoder for Paragraph Image Captioning

Ruifan Li,Haoyu Liang,Yihui Shi,Fangxiang Feng,Xiaojie Wang
DOI: https://doi.org/10.1016/j.neucom.2020.02.041
IF: 6
2020-01-01
Neurocomputing
Abstract:The task of paragraph image captioning aims to generate a coherent paragraph describing a given image. However, due to their limited ability to capture long-term dependency, recurrent neural network or long-short term memory based decoders could hardly generate satisfactory textual descriptions with a long paragraph. In addition, the training inefficiency in the sequential decoders is significantly observed. Motivated by the advantage of convolutional neural network (i.e., CNN), in this paper, we propose a Dual-CNN decoder with long-term memory ability and parallel computation, which can produce a semantically coherent paragraph for an image. Our Dual-CNN model is evaluated on the Stanford image-paragraph dataset. Extensive experiments demonstrate that our Dual-CNN achieves comparable results compared with state-of-the-art models. Furthermore, the diversity and coherence of generated paragraphs are analyzed to show the superiority of our approach.
What problem does this paper attempt to address?