Object Modifier Generation for Image Captioning

Lidou Liao,Yonghong Song,Yuanlin Zhang
DOI: https://doi.org/10.1109/CAC51589.2020.9327182
2020-01-01
Abstract:The recently proposed Encoder-Decoder based models have made great progress in image captioning. However, w(c) find that the captions generated by them ignore some objects that humans care about and mostly lack modifiers for objects. To solve these problems, we proposal an Object Modifier Generation Mechanism (OMGM), which combines modifier vocabulary building and Multi-Task Learning (MTL). By building a modifier vocabulary, OMGM-model emphasizes the described objects, which are more concerned by humans. MTL jointly trains the text generation and the modifier generation, where the auxiliary task makes it easier for the main task to learn more features of modifiers. We analysis qualitatively and quantitatively on Microsoft COCO dataset. Our OMGM-model achieves the best performance on most metrics compared to other models. These results can also quantitatively prove that our OMGM is good for generating better captions with more objects and modifiers.
What problem does this paper attempt to address?