Context-Aware Graph Inference with Knowledge Distillation for Visual Dialog

Dan Guo,Hui Wang,Meng Wang
DOI: https://doi.org/10.1109/tpami.2021.3085755
IF: 23.6
2021-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relational inference in a graphical model with sparse contextual subjects (nodes) and unknown graph structure (relation descriptor); how to model the underlying context-aware relational inference is critical. To this end, we propose a novel context-aware graph (CAG) neural network. We focus on the exploitation of fine-grained relational reasoning with object-level dialog-historical co-reference nodes. The graph structure (relation in dialog) is iteratively updated using an adaptive top- K message passing mechanism. To eliminate sparse useless relations, each node has dynamic relations in the graph (different related K neighbor nodes), and only the most relevant nodes are attributive to the context-aware relational graph inference. In addition, to avoid negative performance caused by linguistic bias of history, we propose a pure visual-aware knowledge distillation mechanism named CAG-Distill, in which image-only visual clues are used to regularize the joint dialog-historical contextual awareness at the object-level. Experimental results on VisDial v0.9 and v1.0 datasets show that both CAG and CAG-Distill outperform comparative methods. Visualization results further validate the remarkable interpretability of our graph inference solution.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?