Context Gating with Multi-Level Ranking Learning for Visual Dialog

Tangming Chen,Lianli Gao,Xiangpeng Li,Lei Zhao,Jingkuan Song
DOI: https://doi.org/10.1109/ICME52920.2022.9859849
2022-01-01
Abstract:Visual dialog aims to answer several consecutive questions based on image and dialog history. Most works resolve all questions with ambiguous references (e.g., “she”) by dialog history, which generates redundant information and gets in-accurate results. Also, they regard this task as a classification task, which ignores the diversity of response answers and results in poor generalization capability. To tackle these problems, we propose a novel Context Gating with Multi-level Ranking Learning (CGMRL). Specifically, the proposed context gating considers both question and image to adaptively determine whether the history is needed for question answering, which reduces the redundant or even noisy information generated by history. To improve the generalization capability of the model, a new constrained multi-level ranking learning is proposed to encourage the model to consider the correct semantic options rather than only choose the ground truth answer. Experimental validations on the VisDial v1.0 show the superiority of the proposed method compared with other methods. Implementation code is published in anonymous Github: https://github.com/sy742/CGMRL_.
What problem does this paper attempt to address?