Hierarchical Visual Relationship Detection

Xu Sun,Yuan Zi,Tongwei Ren,Jinhui Tang,Gangshan Wu
DOI: https://doi.org/10.1145/3343031.3350921
2019-01-01
Abstract:Acting as a bridge between vision and language, visual relationship detection (VRD) aims to represent objects and their interactions in an image with several relationship triplets. Nevertheless, the conventional VRD task shows little consideration for the penalization of incorrect relationship predictions, which in turn undermines its support for image understanding applications. In this paper, we propose a novel VRD task named hierarchical visual relationship detection (HVRD), which encourages predictions with abstract yet compatible relationship triplets when the confidence level of the specific image content is relatively low. Meanwhile, HVRD can handle the inevitable ambiguity of groundtruth annotation in VRD. Based on this, we propose a HVRD method, consisting of hierarchical object detection and hierarchical predicate detection. It can effectively detect the hierarchical visual relationships by exploiting both object concept hierarchy and predicate concept hierarchy with order embedding. We also propose the first datasets for HVRD evaluation, H-VRD and H-VG, by expanding the relationship category spaces of VRD and VG datasets to hierarchical ones respectively. The experimental results show that our method is superior to the state-of-the-art baselines.
What problem does this paper attempt to address?