Structured Semantic Representation for Visual Question Answering.

Dongchen Yu,Xing Gao,Hongkai Xiong
DOI: https://doi.org/10.1109/icip.2018.8451516
2018-01-01
Abstract:A number of models have been proposed to capture rich semantic representation in Visual Question Answering (VQA). In this paper, we illustrate the compositionality of general cognitive ability in VQA and take the linguistic structure of language into consideration in semantic representation. We decompose the question into several components by the semantic tree and apply a tree-structured model to distill the sentence representation. In addition, we exploit the complementary image of the new dataset and optimize the classifier used to predict answers. We design a dual path network for the new VQA 2.0 dataset in training process to lead the model to effectively take advantage of the property of the dataset. Experiments show that our method could obtain more useful information and improve the performance.
What problem does this paper attempt to address?