Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts

Kun Fu,Junqi Jin,Runpeng Cui,Fei Sha,Changshui Zhang
DOI: https://doi.org/10.1109/TPAMI.2016.2642953
IF: 23.6
2017-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an image captioning system that exploits the parallel structures between images and sentences. In our model, the process of generating the next word, given the previously generated ones,...
What problem does this paper attempt to address?