A Novel Image Caption Model Based on Transformer Structure

Shuang Wang,Yaping Zhu
DOI: https://doi.org/10.1109/icicse52190.2021.9404124
2021-01-01
Abstract:The Image Caption algorithm based on the transformer structure is different from the traditional codec model based on convolutional neural networks and recurrent neural networks. Our proposed algorithm uses a convolutional neural network to extract image features and completely relies on the self-attention mechanism to process image features and generate sentences describing image content. The experimental results verified the validation of our method using BLEU, CIDEr and other indicators on the MSCOCO data set.
What problem does this paper attempt to address?