Image Caption Generation with Local Semantic Information and Global Information.

Xing Liu,Weibin Liu,Weiwei Xing
DOI: https://doi.org/10.1109/smartworld-uic-atc-scalcom-iop-sci.2019.00152
2019-01-01
Abstract:Different regions in the image would play different roles in the image description domain, while some key information exists in a small region or some importance features need to be extracted from the whole image. Generally, we only use CNN to extract the features of an image and then utilize those features to generate the description of the image. However, this method is easy to ignore some importance information in the image. In this paper, we propose an image description method which combines the local information and global features of an image. The local information is extracted by a target detection model (SSD) and the global feature is extracted by the multi-instance learning (MIL) method. Our model which works with the above two methods has a good performance on the public dataset MS-COCO.
What problem does this paper attempt to address?