Reciprocal Question Representation Learning Network for Visual Dialog

Zhang Hongwei,Wang Xiaojie,Jiang Si
DOI: https://doi.org/10.1007/s10489-022-03795-8
IF: 5.3
2022-01-01
Applied Intelligence
Abstract:Visual dialog task entails an agent to answer a series of questions based on an image and the dialog history. Biases are often observed when the agent over relies on the dialog history. Thus, balanced usage of dialog history is crucial. Existing models usually drop several rounds of dialog history or learn a sparse dialog structure to address the overreliance on such history; however, bias might still exist in the selected dialog history. Therefore, we propose a new model, reciprocal question representation learning network (RQRLN), with less bias from dialog history by learning more accurate history-aware representations of questions. Initially, RQRLN adaptively selects favorable information at the token level from two representations of a question encoded with and without a dialog history. Later, the adaptive question representation is assembled with the corresponding image for the final decoder. We also used a new entropy loss function which further reduces the dialog history-based bias, enabling two different types of representations of the same token to learn interactively. Analysis results on the VisDial v1.0 dataset showed that our proposed model achieved state-of-the-art results in terms of normalized discounted cumulative gain (NDCG). We also demonstrate that our model shows lesser bias and infers more generic answers in comparison with models that use the entire history.
What problem does this paper attempt to address?