Exploiting Attention For Visual Relationship Detection

Tongxin Hu,Wentong Liao,Michael Ying Yang,Bodo Rosenhahn
DOI: https://doi.org/10.1007/978-3-030-33676-9_23
2019-01-01
Abstract:Visual relationship detection targets on predicting categories of predicates and object pairs, and also locating the object pairs. Recognizing the relationships between individual objects is important for describing visual scenes in static images. In this paper, we propose a novel end-to-end framework on the visual relationship detection task. First, we design a spatial attention model for specializing predicate features. Compared to a normal ROI-pooling layer, this structure significantly improves Predicate Classification performance. Second, for extracting relative spatial configuration, we propose to map simple geometric representations to a high dimension, which boosts relationship detection accuracy. Third, we implement a feature embedding model with a bi-directional RNN which considers subject, predicate and object as a time sequence. We evaluate our method on three tasks. The experiments demonstrate that our method achieves competitive results compared to state-of-the-art methods.
What problem does this paper attempt to address?