Question-Guided Semantic Dual-Graph Visual Reasoning with Novel Answers.

Xinzhe Zhou,Yadong Mu
DOI: https://doi.org/10.1145/3460426.3463647
2021-01-01
Abstract:Visual Question Answering (VQA) has gained increasing attention as being the cross-disciplinary research of computer vision and natural language understanding. However, recent advances mostly treated it as a closed-set classification problem, by limiting the possible outputs to some fixed frequent answers available in a training set. Although effective on benchmark datasets, this paradigm is inherently defective---the VQA model would always fail on a question whose correct answer is out of the answer set, which severely hampers its generalization and flexibility. To try to close the gap, we explore an open-set VQA setting, where models are evaluated using novel samples with unseen answers given dynamic candidate answers from some candidate-generation module. For experimental purposes, two oracle candidate-sampling strategies are proposed to serve as a proxy for the candidate-generation module and generate dynamic candidate answers for testing samples. The conventional classification-based paradigm is no longer applicable in our setting. To this end, we design a matching based VQA model, in which a novel Single-Source Graph Convolutional Network (SSGCN) module is designed to jointly leverage question guidance and dual semantic answer-graphs to produce more discriminative and relevant answer embeddings. Extensive experiments and ablation studies by re-purposing two benchmark datasets demonstrate the effectiveness of our proposed model.
What problem does this paper attempt to address?