DSAMR: Dual-Stream Attention Multi-hop Reasoning for knowledge-based visual question answering

Yanhan Sun,Zhenfang Zhu,Zicheng Zuo,Kefeng Li,Shuai Gong,Jiangtao Qi
DOI: https://doi.org/10.1016/j.eswa.2023.123092
IF: 8.5
2024-01-08
Expert Systems with Applications
Abstract:Knowledge-based visual question answering aims to associate external knowledge facts for answering questions about images. Most existing methods emphasize high-order associations between knowledge facts and questions, and fail to consider the negative effects of unnecessary knowledge facts in multi-hop reasoning. In this paper, we propose a D ual- S tream A ttention M ulti-hop R easoning (DSAMR) architecture that constructs two different attention streams to mitigate unnecessary knowledge facts. This dual-stream mechanism enables the model to reduce the attention weights on unnecessary knowledge while gathering essential knowledge by learning the implicit correlations between knowledge facts and questions. In addition, we designed a hypergraph knowledge extraction module in the architecture to extract optimal knowledge facts by evaluating the relevance of each knowledge fact to the question. The experimental results demonstrate the effectiveness of our method not only on the knowledge-based visual question answering dataset KVQA, but also on the multi-hop question answering dataset PathQuestion.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?