Image Captioning Based on Global-Local Feature and Adaptive-Attention

Xiao-hu ZHAO,Liang-fei YIN,Cheng-long ZHAO
DOI: https://doi.org/10.3785/j.issn.1008-973x.2020.01.015
2020-01-01
Abstract:The image captioning algorithm was proposed in order to explore the difference of the image visual features and the upper layer semantic concept. The algorithm can determine the image focus, mine higher-level semantic information, and improve the description details. Local features were added for the image visual feature extraction, and the global-local feature of the input image was combined with the global features and local features for visual information. Then the focus of the image at different time was determined, and more details of the image were caught. The attention mechanism was added to weight the image feature during decoding, so that the dependence of the text words on the visual information and the semantic information at the current moment could be adaptively adjusted, and the performance of image captioning was effectively improved. The experimental results show that the proposed method can acquire competitive captioning results than other image captioning algorithms. The method can describe the image more accurately and more comprehensively, and the recognition accuracy of tiny objects is higher than others.
What problem does this paper attempt to address?