AOG-LSTM: An Adaptive Attention Neural Network for Visual Storytelling

Hanqing Liu,Jiacheng Yang,Chia-Hao Chang,Wei Wang,Hai-Tao Zheng,Yong Jiang,Hui Wang,Rui Xie,Wei Wu
DOI: https://doi.org/10.1016/j.neucom.2023.126486
IF: 6
2023-06-01
Neurocomputing
Abstract:Visual storytelling is the task of generating a related story for a given image sequence, which has received significant attention. However, using general RNNs (such as LSTM and GRU) as the decoder limit the performance of the models in this task. This is because they can not differentiate different types of information representations. In addition, optimizing the probabilities of subsequent words conditioned on the previous ground-truth sequences can cause error accumulation during inference. Moreover, the existing method of alleviating error accumulation based on replacing reference words does not take into account the different effects of each word. To address the above problems, we propose a modified neural network named AOG-LSTM and a modified training strategy named ARS, respectively. AOG-LSTM can adaptatively pay appropriate attention to different information representations within it when predicting different words. During training, ARS replaces some words in the reference sentences with model predictions similar to the existing method. However, we utilize the selection network and selection strategy to select more appropriate words for the replacement to better improve the model. Experiments on the VIST Dataset demonstrate that our model outperforms several strong baselines on the most commonly used metrics.
computer science, artificial intelligence
What problem does this paper attempt to address?