Knowledge-Aware Causal Inference Network for Visual Dialog.

Zefan Zhang,Chunping Liu,Yi Ji
DOI: https://doi.org/10.1145/3591106.3592272
2023-01-01
Abstract:The effective knowledge and interaction within multi-modalities are key to Visual Dialog. Classic graph-based framework with the direct connection between history dialog and answer fails to give the right answer for the spurious guidance and strong bias induced from history dialog. Recent causal inference framework without this direct connection improves the generalization while worse accuracy. In this work, we propose a novel Knowledge-Aware Causal Inference framework(KACI-Net) in which the commonsense knowledge is introduced into the causal inference framework to achieve both high accuracy and generalization. Specifically, the commonsense knowledge is first generated according to the entities extracted from the question and fused with language and visual features with the co-attention to get the final answer. Comparisons with knowledge-unaware framework and graph-based knowledge-aware framework on VisDial v1.0 dataset show the superiority of our proposed framework and verify the effectiveness the usage of the commonsense knowledge for a good reasoning in Visual Dialog. Both high NDCG and MRR metrics indicate a good trade-off between accuracy and generalization.
What problem does this paper attempt to address?