Iterative Visual Relationship Detection via Commonsense Knowledge Graph

Hai Wan,Jinrui Liang,Jianfeng Du,Yanan Liu,Jialing Ou,Baoyi Wang,Jeff Z. Pan,Juan Zeng
DOI: https://doi.org/10.1016/j.bdr.2020.100175
IF: 3.3
2021-02-01
Big Data Research
Abstract:<p>Scene Graph Generation, which discovers the interaction between pairs of entities in an image, plays a significant role in image understanding. Most recent studies only consider visual features, ignoring the implicit effect of commonsense. We propose a novel model to take the advantage of commonsense knowledge in Scene Graph Generation, named Iterative Visual Relationship Detection with Commonsense Knowledge Graph (<span class="small-caps">IVRDC</span>). <span class="small-caps">IVRDC</span> consists of two modules: a feature module that predicts predicates by visual features and semantic features with a bi-directional recurrent neural network; and a commonsense knowledge module that constructs a specific commonsense knowledge graph for predicate prediction. These two modules roll out iteratively and cross-feed predictions from and to each other. The final predictions are made by taking the result of every iteration into account with an attention mechanism. Experimental results on the Visual Relationship Detection (VRD) dataset and the Visual Genome (VG) dataset demonstrate that our proposed model is competitive.</p>
computer science, information systems, artificial intelligence, theory & methods
What problem does this paper attempt to address?