Recurrent Attention LSTM Model for Image Chinese Caption Generation

Chaoying Zhang,Yaping Dai,Yanyan Cheng,Zhiyang Jia,Kaoru Hirota
DOI: https://doi.org/10.1109/SCIS-ISIS.2018.00134
2018-01-01
Abstract:A Recurrent Attention LSTM model (RAL) is proposed for image Chinese caption generation. The model uses Inception-v4 as CNN model developed by Google to extract image features while the recurrent attention LSTM mechanism determines feature weights. The model can generate words accurately because of adding the weights of image region. Therefore, the proposed model is able to generate more relevant descriptions and improve the efficiency of the system. Compared with Neural Image Caption (NIC) model, the experiment results show that the performance of the proposed model is improved by 1.8% with BLEU-4 metrics and 6.2% with CIDEr metrics on the AI Challenger Image Chinese Captioning dataset.
What problem does this paper attempt to address?