Deep relational self-Attention networks for scene graph generation

Ping Li,Zhou Yu,Yibing Zhan
DOI: https://doi.org/10.1016/j.patrec.2021.12.013
IF: 4.757
2022-01-01
Pattern Recognition Letters
Abstract:Scene graph generation (SGG) aims to simultaneously detect objects in an image and predict relations for these detected objects. SGG is challenging that requires modeling the contextualized relationships among objects rather than only considering relationships between paired objects. Most existing approaches address this problem by using a CNN or RNN framework, which can not explicitly and effectively models the dense interactions among objects. In this paper, we exploit the attention mechanism and introduce a relational self-attention (RSA) module to simultaneously model the object and relation contexts. By stacking such RSA modules in depth, we obtain a deep relational self-attention network (RSAN), which is able to characterize complex interactions thus facilitating the understanding of object and relation semantics. Extensive experiments on the benchmark Visual Genome dataset demonstrate the effectiveness of RSAN.
computer science, artificial intelligence
What problem does this paper attempt to address?