A Unified Generation-Retrieval Framework for Image Captioning

Chunpu Xu,Wei Zhao,Min Yang,Xiang Ao,Wangrong Cheng,Jinwen Tian
DOI: https://doi.org/10.1145/3357384.3358105
2019-01-01
Abstract:Recent image captioning approaches are typically trained on generation-based or retrieval-based approaches. Both methods have their advantages but limited by the disadvantages. In this paper, we propose a Unified Generation-Retrieval framework for Image Captioning (UGRIC) by using adversarial learning. Different from previous methods, the proposed UGRIC model leverages the informative contents of N-best response candidates provided by the retrieval-based model to enhance the generation-based method. In addition, to further improve the informativeness of the generated caption, we employ copying mechanism to choose words from the retrieved candidate captions and put them into proper positions of the output sequence. Experiments on MSCOCO dataset demonstrate the effectiveness of the UGRIC model through various evaluation metrics.\footnoteCode and data are available at: \urlhttp://tinyurl.com/y6z2x6ho.
What problem does this paper attempt to address?