Task-Adaptive Attention for Image Captioning
Chenggang Yan,Yiming Hao,Liang Li,Jian Yin,Anan Liu,Zhendong Mao,Zhenyu Chen,Xingyu Gao
DOI: https://doi.org/10.1109/tcsvt.2021.3067449
IF: 5.859
2022-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Attention mechanisms are now widely used in image captioning models. However, most attention models only focus on visual features. When generating syntax related words, little visual information is needed. In this case, these attention models could mislead the word generation. In this paper, we propose Task-Adaptive Attention module for image captioning, which can alleviate this misleading problem and learn implicit non-visual clues which can be helpful for the generation of non-visual words. We further introduce a diversity regularization to enhance the expression ability of the Task-Adaptive Attention module. Extensive experiments on the MSCOCO captioning dataset demonstrate that by plugging our Task-Adaptive Attention module into a vanilla Transformer-based image captioning model, performance improvement can be achieved.
engineering, electrical & electronic