SKANet - Structured Knowledge-Aware Network for Visual Dialog.

Lei Zhao,Lianli Gao,Yuyu Guo,Jingkuan Song,Heng Tao Shen
DOI: https://doi.org/10.1109/icme51207.2021.9428279
2021-01-01
Abstract:Visual dialog aims to generate an answer to each question based on an image and dialog history. Despite recent progress, existing methods still undergo degradation on the condition of complex scenarios. Handling these scenarios depends on logical reasoning that requires common sense priors. In this paper, we propose a novel visual dialog pipeline, named Structured Knowledge-Aware Network (SKANet), consisting of a Multi-Modality Fusion Module, an Image Knowledge-Aware Module, and a Caption Knowledge-Aware Module. The Multi-Modality Fusion Module explores the textual context about the dialog history and visual content. To deal with the complex scenarios, the Image and Caption Knowledge-Aware Modules construct common sense knowledge graphs from ConceptNet. Experimental results on the VisDial v1.0 dataset show that our proposed method effectively outperforms comparative methods.
What problem does this paper attempt to address?