RR-Net: Relation Reasoning for End-to-End Human-Object Interaction Detection
Dongming Yang,Yuexian Zou,Can Zhang,Meng Cao,Jie Chen
DOI: https://doi.org/10.1109/tcsvt.2021.3119892
IF: 5.859
2021-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:The task of Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects via inferring fine-grained triplets of human, verb, object . Most HOI feature learning techniques are dependent on pre-detected instance regions or human body-part regions, which are computationally expensive and hardly applicable to end-to-end detectors in real applications. In this paper, based on an end-to-end HOI detector, we make a first try to explore region-independent relation reasoning for HOI detection. We first present a Relation-aware Frame, which brings a progressive structure for interaction inference. Upon the Relation-aware Frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: a) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions. Based on modules above, we construct a fully differentiable and end-to-end trainable network named Relation Reasoning Network (abbr. RR-Net). Extensive experiments show that our proposed RR-Net leads to competitive results compared with the state-of-the-art methods on both V-COCO and HICO-DET benchmarks and improves the baseline about 7.6% and 11.1% relatively, validating that this first effort in exploring region-independent relation reasoning has brought obvious improvement for end-to-end HOI detection.
engineering, electrical & electronic