A Framework for Visual Relation Detection Exploiting Global Context.

Rui Sun,Xingfa Zhou,Xin Wang,Lan Yang,Huayi Zhan
DOI: https://doi.org/10.1109/cis54983.2021.00133
2021-01-01
Abstract:Visual relation detection (VRD) is crucial for comprehensive image understanding which requires capturing the interactions between detected objects. However, the inference of the relations between objects is challenging due to the lack of richer context or semantic information. Most of the previous works on VRD focus on local context or simple semantic information. Attending to gather richer information, we develop a dismountable VRD framework that combines the global features and traditional local features from both vision and semantics. Specifically, we first investigate how to construct an efficient global context. Then a dual attention model (DAM) is proposed to gather necessary information from the global context in different modalities. At the reasoning stage, four kinds of features are fused to predict pairwise relations between objects in the image. Experimental results on Visual Genome (VG) dataset validate the effectiveness of our model.
What problem does this paper attempt to address?